README.md

# Upwork Jobs Scraper

This project uses **Upwork's internal API** to scrape newly posted jobs. It avoids using a headless browser for two key reasons:

1. **Efficiency**: API requests are far less resource-intensive than browser automation.
2. **Simplicity**: The API returns clean JSON, eliminating the need for HTML parsing and simplifying downstream processing.

---

## Why Golang?

This scraper is written in **Go** instead of Python due to **Upwork's bot detection techniques**, which rely on analyzing TLS signatures of incoming requests ([explained here](https://scrapfly.io/blog/how-to-avoid-web-scraping-blocking-tls/)).

> At the time of development (3 years ago), I could not find an HTTP client in Python that could accurately mimic browser-like TLS signatures. This has since changed as modern Python libraries now support lower-level TLS emulation through bindings to native clients written in Go/C++.

---

## Usage

1. **Add Credentials**
   Create a `.env` file inside the `upwork/` directory. Add your **authorization** and **cookie** headers.

   > 💡 The easiest way to obtain these is by logging into [Upwork.com](https://www.upwork.com), inspecting **network traffic** in DevTools after passing bot checks, and copying the `Authorization` and `Cookie` headers from any authenticated request.

2. **Keep Credentials Fresh**
   These credentials usually expire daily, so you'll need to refresh them if scraping on a regular basis.

3. **Run the Scraper**
   In the `main.go` file, call:

   ```go
   p.Run("<keyword>")
   ```

   * Replace `<keyword>` with the term you want to search.
   * It will fetch the **top 5000 most recent job postings** that match the keyword and save them in a `.jsonl` file.
   * If you pass an empty string (`p.Run("")`), it will fetch the **most recent jobs** regardless of keyword.

> ⚠️ **Important**: Never use your **personal Upwork account** to extract credentials. Doing so **will result in account suspension**.

---

You can add the Kaggle dataset link in a dedicated **"Related Resources"** or **"Dataset"** section in the `README.md`. This keeps the repo self-contained while clearly guiding users toward the dataset. Here's how to incorporate it cleanly:

---

## Dataset

A dataset created using this scraper is available on **Kaggle**:

📊 **[Upwork Jobs Dataset on Kaggle](https://www.kaggle.com/datasets/hashiromer/upwork-jobs)**

---


## Contributing

This project is no longer actively maintained, but I occasionally check if it still works (as of **June 27, 2025**, it does).

**Potential contributions:**

* Automating the refresh of `Authorization` and `Cookie` headers.
* Adding support for advanced filters or scraping job details.
Update README.md 2025-06-27 20:16:11 +05:00			`# Upwork Jobs Scraper`
Updated Upwork API 2024-08-03 22:53:13 +05:00
Update README.md 2025-06-27 20:16:11 +05:00			`This project uses Upwork's internal API to scrape newly posted jobs. It avoids using a headless browser for two key reasons:`
Updated Upwork API 2024-08-03 22:53:13 +05:00
Update README.md 2025-06-27 20:16:11 +05:00			`1. Efficiency: API requests are far less resource-intensive than browser automation.`
			`2. Simplicity: The API returns clean JSON, eliminating the need for HTML parsing and simplifying downstream processing.`
Updated Upwork API 2024-08-03 22:53:13 +05:00
Update README.md 2025-06-27 20:16:11 +05:00			`---`
Updated Upwork API 2024-08-03 22:53:13 +05:00
Update README.md 2025-06-27 20:16:11 +05:00			`## Why Golang?`
Updated Upwork API 2024-08-03 22:53:13 +05:00
Update README.md 2025-06-27 20:16:11 +05:00			`This scraper is written in Go instead of Python due to Upwork's bot detection techniques, which rely on analyzing TLS signatures of incoming requests ([explained here](https://scrapfly.io/blog/how-to-avoid-web-scraping-blocking-tls/)).`
Updated Upwork API 2024-08-03 22:53:13 +05:00
Update README.md 2025-06-27 20:16:11 +05:00			`> At the time of development (3 years ago), I could not find an HTTP client in Python that could accurately mimic browser-like TLS signatures. This has since changed as modern Python libraries now support lower-level TLS emulation through bindings to native clients written in Go/C++.`
Updated Upwork API 2024-08-03 22:53:13 +05:00
Update README.md 2025-06-27 20:16:11 +05:00			`---`
Updated Upwork API 2024-08-03 22:53:13 +05:00
Update README.md 2025-06-27 20:16:11 +05:00			`## Usage`

			`1. Add Credentials`
			Create a `.env` file inside the `upwork/` directory. Add your authorization and cookie headers.

			> 💡 The easiest way to obtain these is by logging into [Upwork.com](https://www.upwork.com), inspecting network traffic in DevTools after passing bot checks, and copying the `Authorization` and `Cookie` headers from any authenticated request.

			`2. Keep Credentials Fresh`
			`These credentials usually expire daily, so you'll need to refresh them if scraping on a regular basis.`

			`3. Run the Scraper`
			In the `main.go` file, call:

			```go
			`p.Run("<keyword>")`
			```

			* Replace `<keyword>` with the term you want to search.
			* It will fetch the top 5000 most recent job postings that match the keyword and save them in a `.jsonl` file.
			* If you pass an empty string (`p.Run("")`), it will fetch the most recent jobs regardless of keyword.

			`> ⚠️ Important: Never use your personal Upwork account to extract credentials. Doing so will result in account suspension.`

			`---`

Update README.md 2025-06-27 20:23:12 +05:00			You can add the Kaggle dataset link in a dedicated "Related Resources" or "Dataset" section in the `README.md`. This keeps the repo self-contained while clearly guiding users toward the dataset. Here's how to incorporate it cleanly:

			`---`

			`## Dataset`

			`A dataset created using this scraper is available on Kaggle:`

			`📊 [Upwork Jobs Dataset on Kaggle](https://www.kaggle.com/datasets/hashiromer/upwork-jobs)`

			`---`


Update README.md 2025-06-27 20:16:11 +05:00			`## Contributing`

			`This project is no longer actively maintained, but I occasionally check if it still works (as of June 27, 2025, it does).`

			`Potential contributions:`

			* Automating the refresh of `Authorization` and `Cookie` headers.
			`* Adding support for advanced filters or scraping job details.`