Christian Kellner ac02817d4e Switch browser engine from puppeteer-extra/stealth to CloakBrowser (#307)
* Switch browser engine from puppeteer-extra/stealth to CloakBrowser

- Replace puppeteer, puppeteer-extra, puppeteer-extra-plugin-stealth with
  cloakbrowser + puppeteer-core; CloakBrowser applies 49 source-level C++
  fingerprint patches that cannot be detected at the JS layer.
- Enable humanize:true in launchBrowser() for Bézier mouse curves, natural
  keyboard timing, and realistic scroll physics.
- Remove manual userDataDir management and ARM64 executablePath override;
  CloakBrowser ships its own binary for x86_64 and arm64.
- Proxy is now passed via CloakBrowser's native proxy option instead of
  --proxy-server Chrome flag.
- Dockerfile: add fonts-noto-color-emoji + fonts-freefont-ttf so canvas
  fingerprint hashes match real browsers (required for Kasada/Akamai);
  replace npx puppeteer browsers install with node ensureBinary() call;
  remove TARGETARCH ARG and ARM64 system-Chromium branch.
- Update test mock to reflect simplified browser object (no __fredy_* fields).

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Add --ignore-certificate-errors for CloakBrowser's custom Chromium

CloakBrowser ships its own Chromium binary with an independent CA bundle.
This flag prevents ERR_CERT_AUTHORITY_INVALID failures in environments with
SSL-inspecting proxies or non-standard root CAs (Docker CI, corporate networks).

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Harden CloakBrowser integration and fix kleinanzeigen detail test

- Remove all CDP overrides (applyBotPreventionToPage, applyLanguagePersistence,
  applyPostNavigationHumanSignals) that created detectable inconsistencies on top
  of CloakBrowser's C++ patches; pass locale to CloakBrowser launch instead
- Drop --lang arg (replaced by CloakBrowser locale flag)
- Extend immowelt puppeteerTimeout to 90 s to accommodate React SPA rendering
  latency under CloakBrowser's humanise delays
- Fix kleinanzeigen detail test: serve the offline fixture for the search URL
  so only individual detail pages are fetched live, avoiding rate limiting from
  a second fresh session hitting the same search endpoint

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Fix immowelt bot detection with two-phase navigation and fixture-backed detail test

Immowelt's CDN challenges cold browser sessions before React can render the
listing grid, causing the old waitForSelector approach to silently timeout.

- Add preNavigateUrl option to puppeteerExtractor: visits a warm-up page
  first so the site sees an established session before the search URL
- Add waitForNetworkIdle option: a second idle-wait phase after domcontentloaded
  that catches React's listing API round-trip (which fires long after the
  initial HTML is parsed); errors are swallowed so partial DOM is still used
- Switch immowelt config to waitForSelector=null + networkidle warm-up so
  page.content() is returned after the SPA has loaded its data
- Set immowelt preNavigateUrl to the homepage to warm the session
- In the detail enrichment test, spy on puppeteerExtractor to serve the
  offline fixture for the search URL; only individual listing detail pages
  are fetched live (they are far less aggressively protected)

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Ensure CloakBrowser binary is present before any live test runs

Add a Vitest globalSetup that calls ensureBinary() once in the main process
before workers start. Without this, running yarn test on a fresh checkout
(or after the binary cache is cleared) immediately fails every browser-based
test with "Failed to launch the browser process" before any useful output
appears. The setup is a no-op in offline mode and when the binary is already
cached.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Ensure CloakBrowser binary at startup for non-Docker installs

Direct runs (yarn start:backend) on a fresh checkout have no binary and
only crash when the first scraping job fires. Calling ensureBinary() at
startup downloads it on first run and is instant when already cached.
In Docker it stays a no-op since the binary is pre-baked during docker build.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Fix --no-zygote comment: ICU crash was corrupted .4 binary, not fd issue

The "Invalid file descriptor to ICU data received" crash seen in Sparkasse
tests was caused by a partially-extracted CloakBrowser .4 binary that
contained only the chrome executable but was missing icudtl.dat and other
resource files. The ensureBinary() function returned this incomplete
installation because latest_version_linux-x64 pointed to .4.

The --no-zygote flag is kept as a safeguard for container environments
with limited kernel namespaces, but the comment now accurately describes
its purpose rather than attributing it to a non-existent fd inheritance issue.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Add ensureValidBinary() to detect and auto-heal corrupt CloakBrowser installs

CloakBrowser's ensureBinary() only checks that the chrome executable exists,
not that required resource files (icudtl.dat, resources.pak) are present.
A partial extraction — e.g. an interrupted update — can leave a directory
that passes ensureBinary()'s check but causes Chrome to crash immediately
with "Invalid file descriptor to ICU data received".

ensureValidBinary() wraps ensureBinary() with a completeness check:
- If the required resource files are missing it removes the corrupt directory
  and all latest_version* markers, then calls ensureBinary() again so it
  falls back to (or re-downloads) a complete build.
- It pins the validated path via CLOAKBROWSER_BINARY_PATH so CloakBrowser's
  own internal ensureBinary() call inside launch() always uses the same,
  verified binary.

Used in index.js (app startup) and test/globalSetup.js (before live tests).

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Fix sparkasse detail test: serve search URL from fixture to avoid rate-limiting

The second sparkasse test launched a fresh browser against the live search
endpoint right after the first test already did, leaving the IP in a suspicious
state that caused bot detection or rate-limiting to return empty results.
When getListings() returns nothing, execute() resolves to undefined and
expect(listings).toBeInstanceOf(Array) fails.

Apply the same hybrid fixture approach used by kleinanzeigen and immowelt:
intercept puppeteerExtractor calls whose pathname matches the search URL and
return the offline fixture, while letting individual detail page requests go
live (they are less aggressively rate-limited than the search endpoint).

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Fix sparkasse detail test: shared browser, direct fetchDetails call

Remove the fixture-backed spy — live tests must hit the real server.

Root problem: two cold browser sessions hitting sparkasse in quick succession
triggered bot detection, causing the second search request to return empty
results and execute() to resolve undefined.

Fix:
- One browser launched in beforeAll and reused across both tests, so both
  the search and detail requests come from the same warm session.
- The detail test calls provider.config.fetchDetails() directly on the
  listings returned by the first test instead of re-running the full pipeline.
  This avoids a redundant second scrape of the search page while still
  exercising the live detail endpoint.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Eliminate fixture spies and double live requests in all provider detail tests

All five provider tests with a 'with provider_details enabled' describe block
were either (a) intercepting the search URL with an offline fixture to avoid
hitting the live server twice, or (b) re-running the full execute() pipeline
with a fresh browser, which triggered rate-limiting / bot detection on the
second cold request.

Pattern applied to all five:
- immowelt, kleinanzeigen, wgGesucht, immobilienDe: launch one browser in
  beforeAll/afterAll, pass it to the first test's Fredy constructor, and call
  provider.config.fetchDetails() directly in the second test using the listings
  and browser already in hand. One warm session, two live endpoints tested.
- immoscout: API-based (no browser), so no browser sharing needed. Second test
  calls provider.config.fetchDetails() directly on liveListings[0] from the
  first test instead of re-querying the search API.

Removed: all readFixture spies, getKnownListingHashesForJobAndProvider mocks,
and the puppeteerExtractorMod imports that were only needed for the spy.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Fix ensureValidBinary for macOS: platform-aware completeness check

On macOS the CloakBrowser binary lives at:
  ~/.cloakbrowser/chromium-X.Y.Z/Chromium.app/Contents/MacOS/Chromium

path.dirname() gave Contents/MacOS/ — but icudtl.dat and resources.pak
are inside Contents/Frameworks/…, not next to the binary. So the old
code incorrectly flagged every macOS installation as corrupt, deleted only
the MacOS/ subdirectory (not the full versioned dir), then failed again.

Fixes:
- isBinaryComplete: on macOS check for Info.plist and Frameworks/ inside
  Chromium.app/Contents/ instead of looking for Linux resource files next
  to the binary. On Linux/Windows the existing check is unchanged.
- getVersionedDir: resolves the full chromium-X.Y.Z/ directory regardless
  of platform (4 levels up on macOS, 1 on Linux/Windows) so
  removeCorruptInstallation always deletes the entire versioned tree.
- missingDescription: reports the correct missing items per platform.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-05-10 15:42:31 +02:00
2024-11-19 13:45:07 +01:00
2025-12-09 13:56:46 +01:00
2025-09-18 17:28:30 +02:00
2026-01-28 14:27:03 +01:00
2026-04-22 21:11:18 +02:00
2025-12-10 13:23:17 +01:00
2026-05-01 20:12:58 +02:00
2026-02-05 12:02:18 +01:00
2022-03-09 14:28:13 +01:00
2026-04-27 16:56:04 +02:00
2026-01-12 15:00:36 +01:00
2026-04-27 16:56:04 +02:00
2026-03-16 14:26:58 +01:00
2026-04-22 21:11:18 +02:00
2026-04-12 09:17:23 +02:00
2026-01-04 06:46:32 +01:00
2026-04-12 09:21:08 +02:00

Jetbrains Open Source

Website  |   Demo

Tests Docker Source Docker Pulls

Fredy 🏡 Your Self-Hosted Real Estate Finder for Germany

Finding an apartment or house in Germany can be stressful and time-consuming.
Fredy makes it easier: it automatically scrapes ImmoScout24, Immowelt, Immonet, eBay Kleinanzeigen, and WG-Gesucht and notifies you instantly via Slack, Telegram, Email, ntfy, discord and more when new listings appear.

With a modern architecture, Fredy provides a clean Web UI, removes duplicates across platforms, and stores results so you never see the same listing twice.


Key Features

  • 🏠 Scrapes ImmoScout24, Immowelt, Immonet, eBay Kleinanzeigen, WG-Gesucht
  • Instant notifications: Slack, Telegram, Email (SendGrid, Mailjet), ntfy, discord
  • 🔎 Uses the ImmoScout Mobile API (reverse engineered)
  • 🌍 Runs anywhere: Docker, Node.js, self-hosted
  • 🖥️ Intuitive Web UI to manage searches
  • 🎯 Easy to use thanks to a user-friendly Web UI
  • 🔄 Deduplication across platforms
  • ⏱️ Customizable search intervals

🤝 Sponsorship

I maintain Fredy and other open-source projects in my free time.
If you find it useful, consider supporting the project 💙

Fredy is proudly backed by the JetBrains Open Source Support Program.

Jetbrains Open Source

👨‍🏫 Demo

You can try out Fredy here: Fredy Demo


🚀 Quick Start

With Docker

Note

In order to start Fredy, you must provide a config.json. As a start, use the one in this repo: https://github.com/orangecoding/fredy/blob/master/conf/config.json

docker run -d --name fredy \
  -v fredy_conf:/conf \
  -v fredy_db:/db \
  -p 9998:9998 \
  ghcr.io/orangecoding/fredy:master

Logs:

docker logs fredy -f

Manual (Node.js)

  • Requirement: Node.js 22 or higher
  • Install dependencies and start:
yarn
yarn run start:backend   # in one terminal
yarn run start:frontend  # in another terminal

👉 Open http://localhost:9998

With Unraid

Should you use Unraid, you can now install Fredy from the community store :)

Default Login:

  • Username: admin
  • Password: admin

📸 Screenshots

Fredy Maps View Dashboard Found Listings
Screenshot showing Fredy Screenshot showing job configuration in Fredy Screenshot showing found listings in Fredy

🧩 Core Concepts

Fredy is built around three simple concepts:

Provider 🌐

A provider is a real-estate platform (e.g. ImmoScout24, Immowelt, Immonet, eBay Kleinanzeigen, WG-Gesucht).
When you create a job, you paste the search URL from the platform into Fredy.
⚠️ Always make sure the search results are sorted by date, so Fredy picks up the newest listings first.

Adapter 📡

An adapter is the channel through which Fredy notifies you (Slack, Telegram, Email, ntfy, discord ...).
Each adapter has its own configuration (e.g. API keys, webhook URLs).
You can use multiple adapters at once --- Fredy will send new listings through all of them.

Job 📅

A job combines providers and adapters.
Example: "Search apartments on ImmoScout24 + Immowelt and send results to Slack + Telegram."
Jobs run automatically at the interval you configure (see /conf/config.json).

MCP Server 🤖

Starting with V20, Fredy ships with a built-in **MCP Server **. This allows you to connect Fredy to LLMs (like Claude, ChatGPT, or local models via LM Studio) and query your real estate data using natural language. The local LLM can even enrich existing listings by checking the listing online.

For more information on how to set it up and use it, please refer to the MCP Readme.


Immoscout

Immoscout has implemented advanced bot detection. In order to work around this, we are using a reversed engineered version of their mobile api. See Immoscout Reverse Engineering Documentation

Analytics

Fredy is completely free (and will always remain free). However, it would be a huge help if youd allow me to collect some analytical data. Before you freak out, let me explain...
If you agree, Fredy will send a ping once every 6 hours to my internal tracking project (Will be open sourced soon).
The data includes: names of active adapters/providers, OS, architecture, Node version, and language. The information is entirely anonymous and helps me understand which adapters/providers are most frequently used.

Thanks🤘

🛠️ Development

Development Mode

yarn run start:backend:dev
yarn run start:frontend:dev

You should now be able to access Fredy from your browser. Check your Terminal to see what port the frontend is running on.

Run Tests

"Online" tests

These tests are directly executed against the actual providers.

yarn run test

"Offline" tests

These tests are using the test fixtures instead of the actual providers. Much faster and "good enough" to test the core functionality.

yarn run test:offline

Download new fixtures

If you have to refresh the fixtures (every once in a while needed because the providers change their code), run this command:

yarn run download-fixtures

📐 Architecture

flowchart TD
 subgraph Jobs["Jobs"]
        A1["Job 1"]
        A2["Job 2"]
        A3["Job 3"]
  end
 subgraph Providers["Providers"]
        C1["Provider 1"]
        C2["Provider 2"]
        C3["Provider 3"]
  end
 subgraph NotificationAdapters["Notification Adapters"]
        F1["Adapter 1"]
        F2["Adapter 2"]
  end

    A1 --> B["FredyPipelineExecutioner"]
    A2 --> B
    A3 --> B
    B --> C1 & C2 & C3
    C1 --> D["Similarity Check"]
    C2 --> D
    C3 --> D
    D --> E{"Duplicate?"}
    E -- No --> F1
    F1 --> F2

🤖 Using AI such as Claude Code

When I started building Fredy, LLMs were still basically the wet dream of a few nerdy scientists.

Nowadays, its easier than ever to throw a prompt into the LLM of your choice and let 'the AI' build your stuff. Im not against that. I use Claude Code myself for smaller tasks, and I do think these tools can be really useful.

That said, I still believe humans should stay in charge. AI is great-ish at writing code, but it still lacks creativity, context, and the ability to see the full picture.

So, if you want to contribute to Fredy, using AI tools to get things done is totally fine. Just please dont stop thinking.

Ive had one too many PRs full of hallucinated bullshit.

Thanks ;)


👐 Contributing

Thanks to everyone who has contributed!

See the Contributing Guide.


Star History

Star History
Chart

Languages
JavaScript 92.9%
Less 5%
Handlebars 1.5%
Shell 0.3%
Dockerfile 0.2%
Other 0.1%