This scraper is written in **Go** instead of Python due to **Upwork's bot detection techniques**, which rely on analyzing TLS signatures of incoming requests ([explained here](https://scrapfly.io/blog/how-to-avoid-web-scraping-blocking-tls/)).
> At the time of development (3 years ago), I could not find an HTTP client in Python that could accurately mimic browser-like TLS signatures. This has since changed as modern Python libraries now support lower-level TLS emulation through bindings to native clients written in Go/C++.
Create a `.env` file inside the `upwork/` directory. Add your **authorization** and **cookie** headers.
> 💡 The easiest way to obtain these is by logging into [Upwork.com](https://www.upwork.com), inspecting **network traffic** in DevTools after passing bot checks, and copying the `Authorization` and `Cookie` headers from any authenticated request.
2.**Keep Credentials Fresh**
These credentials usually expire daily, so you'll need to refresh them if scraping on a regular basis.
3.**Run the Scraper**
In the `main.go` file, call:
```go
p.Run("<keyword>")
```
* Replace `<keyword>` with the term you want to search.
* It will fetch the **top 5000 most recent job postings** that match the keyword and save them in a `.jsonl` file.
* If you pass an empty string (`p.Run("")`), it will fetch the **most recent jobs** regardless of keyword.
> ⚠️ **Important**: Never use your **personal Upwork account** to extract credentials. Doing so **will result in account suspension**.
You can add the Kaggle dataset link in a dedicated **"Related Resources"** or **"Dataset"** section in the `README.md`. This keeps the repo self-contained while clearly guiding users toward the dataset. Here's how to incorporate it cleanly:
---
## Dataset
A dataset created using this scraper is available on **Kaggle**:
📊 **[Upwork Jobs Dataset on Kaggle](https://www.kaggle.com/datasets/hashiromer/upwork-jobs)**