Upwork Jobs Scraper
This project uses Upwork's internal API to scrape newly posted jobs. It avoids using a headless browser for two key reasons:
- Efficiency: API requests are far less resource-intensive than browser automation.
- Simplicity: The API returns clean JSON, eliminating the need for HTML parsing and simplifying downstream processing.
Why Golang?
This scraper is written in Go instead of Python due to Upwork's bot detection techniques, which rely on analyzing TLS signatures of incoming requests (explained here).
At the time of development (3 years ago), I could not find an HTTP client in Python that could accurately mimic browser-like TLS signatures. This has since changed as modern Python libraries now support lower-level TLS emulation through bindings to native clients written in Go/C++.
Usage
-
Add Credentials Create a
.envfile inside theupwork/directory. Add your authorization and cookie headers.💡 The easiest way to obtain these is by logging into Upwork.com, inspecting network traffic in DevTools after passing bot checks, and copying the
AuthorizationandCookieheaders from any authenticated request. -
Keep Credentials Fresh These credentials usually expire daily, so you'll need to refresh them if scraping on a regular basis.
-
Run the Scraper In the
main.gofile, call:p.Run("<keyword>")- Replace
<keyword>with the term you want to search. - It will fetch the top 5000 most recent job postings that match the keyword and save them in a
.jsonlfile. - If you pass an empty string (
p.Run("")), it will fetch the most recent jobs regardless of keyword.
- Replace
⚠️ Important: Never use your personal Upwork account to extract credentials. Doing so will result in account suspension.
Dataset
A dataset created using this scraper is available on Kaggle:
📊 Upwork Jobs Dataset on Kaggle
Contributing
This project is no longer actively maintained, but I occasionally check if it still works (as of June 27, 2025, it does).
Potential contributions:
- Automating the refresh of
AuthorizationandCookieheaders. - Adding support for advanced filters or scraping job details.