Commit Graph

99 Commits

Author SHA1 Message Date
orangecoding
322ae199b0 allowing multiple chat id's for telegram 2026-06-03 09:46:56 +02:00
Christian Kellner
44edf47393 Improved Listing Management (#317)
* adding ability to tag listings eg if you have applied to it / adding ability to add notes to a listing

* storing the date when a status was set
2026-06-02 21:10:08 +02:00
orangecoding
5ceac25aa6 fixing #319 & #318 2026-06-02 20:11:43 +02:00
orangecoding
ee2112a24d fixing tests harder 2026-06-02 10:55:16 +02:00
orangecoding
5a54448288 fixing tests 2026-06-02 10:49:06 +02:00
Christian Kellner
a834abc31c fixing filtering of lists (#311)
* fixing listing filtering by applying the correct id
2026-06-02 09:24:45 +02:00
orangecoding
996b841cfb adding ability to add proxies for cloak 2026-05-24 20:49:27 +02:00
orangecoding
8b012ef2f1 upgrading dependencies / new pois 2026-05-11 09:18:32 +02:00
Christian Kellner
ac02817d4e Switch browser engine from puppeteer-extra/stealth to CloakBrowser (#307)
* Switch browser engine from puppeteer-extra/stealth to CloakBrowser

- Replace puppeteer, puppeteer-extra, puppeteer-extra-plugin-stealth with
  cloakbrowser + puppeteer-core; CloakBrowser applies 49 source-level C++
  fingerprint patches that cannot be detected at the JS layer.
- Enable humanize:true in launchBrowser() for Bézier mouse curves, natural
  keyboard timing, and realistic scroll physics.
- Remove manual userDataDir management and ARM64 executablePath override;
  CloakBrowser ships its own binary for x86_64 and arm64.
- Proxy is now passed via CloakBrowser's native proxy option instead of
  --proxy-server Chrome flag.
- Dockerfile: add fonts-noto-color-emoji + fonts-freefont-ttf so canvas
  fingerprint hashes match real browsers (required for Kasada/Akamai);
  replace npx puppeteer browsers install with node ensureBinary() call;
  remove TARGETARCH ARG and ARM64 system-Chromium branch.
- Update test mock to reflect simplified browser object (no __fredy_* fields).

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Add --ignore-certificate-errors for CloakBrowser's custom Chromium

CloakBrowser ships its own Chromium binary with an independent CA bundle.
This flag prevents ERR_CERT_AUTHORITY_INVALID failures in environments with
SSL-inspecting proxies or non-standard root CAs (Docker CI, corporate networks).

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Harden CloakBrowser integration and fix kleinanzeigen detail test

- Remove all CDP overrides (applyBotPreventionToPage, applyLanguagePersistence,
  applyPostNavigationHumanSignals) that created detectable inconsistencies on top
  of CloakBrowser's C++ patches; pass locale to CloakBrowser launch instead
- Drop --lang arg (replaced by CloakBrowser locale flag)
- Extend immowelt puppeteerTimeout to 90 s to accommodate React SPA rendering
  latency under CloakBrowser's humanise delays
- Fix kleinanzeigen detail test: serve the offline fixture for the search URL
  so only individual detail pages are fetched live, avoiding rate limiting from
  a second fresh session hitting the same search endpoint

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Fix immowelt bot detection with two-phase navigation and fixture-backed detail test

Immowelt's CDN challenges cold browser sessions before React can render the
listing grid, causing the old waitForSelector approach to silently timeout.

- Add preNavigateUrl option to puppeteerExtractor: visits a warm-up page
  first so the site sees an established session before the search URL
- Add waitForNetworkIdle option: a second idle-wait phase after domcontentloaded
  that catches React's listing API round-trip (which fires long after the
  initial HTML is parsed); errors are swallowed so partial DOM is still used
- Switch immowelt config to waitForSelector=null + networkidle warm-up so
  page.content() is returned after the SPA has loaded its data
- Set immowelt preNavigateUrl to the homepage to warm the session
- In the detail enrichment test, spy on puppeteerExtractor to serve the
  offline fixture for the search URL; only individual listing detail pages
  are fetched live (they are far less aggressively protected)

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Ensure CloakBrowser binary is present before any live test runs

Add a Vitest globalSetup that calls ensureBinary() once in the main process
before workers start. Without this, running yarn test on a fresh checkout
(or after the binary cache is cleared) immediately fails every browser-based
test with "Failed to launch the browser process" before any useful output
appears. The setup is a no-op in offline mode and when the binary is already
cached.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Ensure CloakBrowser binary at startup for non-Docker installs

Direct runs (yarn start:backend) on a fresh checkout have no binary and
only crash when the first scraping job fires. Calling ensureBinary() at
startup downloads it on first run and is instant when already cached.
In Docker it stays a no-op since the binary is pre-baked during docker build.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Fix --no-zygote comment: ICU crash was corrupted .4 binary, not fd issue

The "Invalid file descriptor to ICU data received" crash seen in Sparkasse
tests was caused by a partially-extracted CloakBrowser .4 binary that
contained only the chrome executable but was missing icudtl.dat and other
resource files. The ensureBinary() function returned this incomplete
installation because latest_version_linux-x64 pointed to .4.

The --no-zygote flag is kept as a safeguard for container environments
with limited kernel namespaces, but the comment now accurately describes
its purpose rather than attributing it to a non-existent fd inheritance issue.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Add ensureValidBinary() to detect and auto-heal corrupt CloakBrowser installs

CloakBrowser's ensureBinary() only checks that the chrome executable exists,
not that required resource files (icudtl.dat, resources.pak) are present.
A partial extraction — e.g. an interrupted update — can leave a directory
that passes ensureBinary()'s check but causes Chrome to crash immediately
with "Invalid file descriptor to ICU data received".

ensureValidBinary() wraps ensureBinary() with a completeness check:
- If the required resource files are missing it removes the corrupt directory
  and all latest_version* markers, then calls ensureBinary() again so it
  falls back to (or re-downloads) a complete build.
- It pins the validated path via CLOAKBROWSER_BINARY_PATH so CloakBrowser's
  own internal ensureBinary() call inside launch() always uses the same,
  verified binary.

Used in index.js (app startup) and test/globalSetup.js (before live tests).

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Fix sparkasse detail test: serve search URL from fixture to avoid rate-limiting

The second sparkasse test launched a fresh browser against the live search
endpoint right after the first test already did, leaving the IP in a suspicious
state that caused bot detection or rate-limiting to return empty results.
When getListings() returns nothing, execute() resolves to undefined and
expect(listings).toBeInstanceOf(Array) fails.

Apply the same hybrid fixture approach used by kleinanzeigen and immowelt:
intercept puppeteerExtractor calls whose pathname matches the search URL and
return the offline fixture, while letting individual detail page requests go
live (they are less aggressively rate-limited than the search endpoint).

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Fix sparkasse detail test: shared browser, direct fetchDetails call

Remove the fixture-backed spy — live tests must hit the real server.

Root problem: two cold browser sessions hitting sparkasse in quick succession
triggered bot detection, causing the second search request to return empty
results and execute() to resolve undefined.

Fix:
- One browser launched in beforeAll and reused across both tests, so both
  the search and detail requests come from the same warm session.
- The detail test calls provider.config.fetchDetails() directly on the
  listings returned by the first test instead of re-running the full pipeline.
  This avoids a redundant second scrape of the search page while still
  exercising the live detail endpoint.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Eliminate fixture spies and double live requests in all provider detail tests

All five provider tests with a 'with provider_details enabled' describe block
were either (a) intercepting the search URL with an offline fixture to avoid
hitting the live server twice, or (b) re-running the full execute() pipeline
with a fresh browser, which triggered rate-limiting / bot detection on the
second cold request.

Pattern applied to all five:
- immowelt, kleinanzeigen, wgGesucht, immobilienDe: launch one browser in
  beforeAll/afterAll, pass it to the first test's Fredy constructor, and call
  provider.config.fetchDetails() directly in the second test using the listings
  and browser already in hand. One warm session, two live endpoints tested.
- immoscout: API-based (no browser), so no browser sharing needed. Second test
  calls provider.config.fetchDetails() directly on liveListings[0] from the
  first test instead of re-querying the search API.

Removed: all readFixture spies, getKnownListingHashesForJobAndProvider mocks,
and the puppeteerExtractorMod imports that were only needed for the spy.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

* Fix ensureValidBinary for macOS: platform-aware completeness check

On macOS the CloakBrowser binary lives at:
  ~/.cloakbrowser/chromium-X.Y.Z/Chromium.app/Contents/MacOS/Chromium

path.dirname() gave Contents/MacOS/ — but icudtl.dat and resources.pak
are inside Contents/Frameworks/…, not next to the binary. So the old
code incorrectly flagged every macOS installation as corrupt, deleted only
the MacOS/ subdirectory (not the full versioned dir), then failed again.

Fixes:
- isBinaryComplete: on macOS check for Info.plist and Frameworks/ inside
  Chromium.app/Contents/ instead of looking for Linux resource files next
  to the binary. On Linux/Windows the existing check is unchanged.
- getVersionedDir: resolves the full chromium-X.Y.Z/ directory regardless
  of platform (4 levels up on macOS, 1 on Linux/Windows) so
  removeCorruptInstallation always deletes the entire versioned tree.
- missingDescription: reports the correct missing items per platform.

https://claude.ai/code/session_01WXzA3orbwE2hdk723c6MgH

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-05-10 15:42:31 +02:00
orangecoding
921057252d adding immoscout shape search 2026-05-07 19:11:47 +02:00
orangecoding
3d10dc6042 moving from restana to fastify 2026-04-27 16:56:04 +02:00
orangecoding
c78472bd19 adding 'open in fredy' 2026-04-21 19:42:39 +02:00
orangecoding
8c5607e20b adding test fixtures so that we can run tests 'offline' 2026-04-21 13:37:00 +02:00
orangecoding
e2d10d179e next release version 2026-04-12 09:21:08 +02:00
Stephan
10c94eea0a Feature/spec filter (#276)
* feat(): create map component, add area filtering to the job config

* feat(): filter listings by area filter

* chore(): cleanup

* feat(): solve feedback

* feat(): solve most providers

* feat(): solve maybe other providers

* feat(): add specFilter config, also add rooms to listing

* feat(): change tests

* feat(): fix kleinanzeigen parser

* feat(): add spec filter switch for listing overviiews

* feat(): add rooms and size to the overview and detail of a listing

* feat(): rem label

* feat(): add types, update providers, they now return specs as numbers

* feat(): add jsonconfig to enable type checks

* feat: add type for prividerConfig, add fieldNames per provider

* feat: fix tests, provider, add formatListing

* chore: remov duplicates

* feat(): fix tests

* feat: fix immoscout

* chore: geojson typing

* feat: solve requested changes
2026-04-12 09:17:23 +02:00
Christian Kellner
cdc0cbda2f Feature/kleinanzeigen new (#292)
* Feature/Kleinanzeigen addresses (#289)

* upgrade dependencies

* immoscout_details -> provider_details

* fetching details more generic

* removing claude action

* fixing sparkassen selector

* improvements

* fixing immobilienDE test

* upgrading dependencies

* settings for many provider

---------

Co-authored-by: Adrian Bach <65734063+realDayaa@users.noreply.github.com>
2026-04-07 19:53:40 +02:00
orangecoding
cbf2766783 cleanup 2026-03-16 14:48:01 +01:00
orangecoding
1b39e345b6 moving from jest to vitest 2026-03-16 14:26:58 +01:00
orangecoding
4596442f64 upgrading dependencies | mark listings as 'manually_removed' when filtered 2026-03-08 09:55:46 +01:00
Stephan
0bcfa1d4ad feat(): map area filter (#273)
* feat(): create map component, add area filtering to the job config

* feat(): filter listings by area filter

* chore(): cleanup

* feat(): solve feedback

* feat(): solve most providers

* feat(): solve maybe other providers
2026-03-08 09:44:18 +01:00
Christian Kellner
00d6a12b30 Puppeteer improvements (#270)
* improve puppeteer handling. Now only 1 puppeteer instance is being used which is WAY more efficient

* removing package-lock

* reduce logging

* removing problematic docker command

* Remove Immonet. They now belong to immowelt
2026-02-18 20:05:02 +01:00
orangecoding
3117044139 fixing immoscout scraper 2026-01-26 19:52:37 +01:00
Christian Kellner
4dd0370ec1 Calculating the distance (#255)
* migra for distance

* adding distance calculator

* adding ability to store home address

* improve distance calculation

* calculating distance

* show distance in grid view

* upgrading dependencies

* moving to react 19

* ability to clone a job

* fixing tests

* polishing
2026-01-22 16:09:36 +01:00
Christian Kellner
d43c5b3f97 Map View in Fredy :D (#253)
* init map view

* switching off 3d buildings when sattelite view is on

* rename menu items

* upgrading dependencies, adding provider to popups

* adding screenshot for map view

* fixing readme

* next release version
2026-01-12 15:00:36 +01:00
orangecoding
7fd8be07a2 adding wohnungsboerse provider 2026-01-09 11:37:03 +01:00
orangecoding
5dc976c7e3 ability to start jobs individually 2025-12-18 19:16:28 +01:00
orangecoding
05f1bc61c9 fixing tests 2025-12-17 16:35:24 +01:00
orangecoding
6e8a35a836 adding backup/restore ability 2025-12-17 15:48:56 +01:00
orangecoding
790c559316 foced to move to Apache 2.0 license 2025-12-11 10:40:55 +01:00
orangecoding
5bd4219743 upgrading dependencies | adding ohneMakler provider 2025-12-08 20:31:28 +01:00
orangecoding
22df683969 more efficient bot protection 2025-11-27 10:30:47 +01:00
orangecoding
79a8420dfb improving similarity cache 2025-10-29 09:36:05 +01:00
orangecoding
0436c7f7d7 upgrading dependencies / FredyRuntime >> FredyPipeline 2025-10-12 16:43:56 +02:00
Thomas Brockmöller
7ebd73c9cf Add new provider McMakler (#201) 2025-09-28 14:16:28 +02:00
Thomas Brockmöller
4d37e890ab Add provider for Regionalimmobilien24 (#197) 2025-09-27 14:19:37 +02:00
Thomas Brockmöller
7589f20a18 Add sparkasse immobilien (#199) 2025-09-27 09:43:24 +02:00
Thomas Brockmöller
702ffabc1a Fix and improve immowelt/immonet provider (#194)
* Fix and improve immowelt provider

* Add description to immonet provider

* Fix tests and improve readability
2025-09-27 09:42:08 +02:00
Christian Kellner
8324357edb Improvements (#193)
* improving release banner

* renaming general to settings

* fixing working hours if they go to next day

* fixing comparing versions

* upgrade dependencies
2025-09-26 10:45:55 +02:00
Thomas Brockmöller
dd5c5b29d9 Fix address value in similarity filtering (#191)
* Fix address field in similarity filter
2025-09-25 15:02:00 +02:00
Christian Kellner
c839f3abc9 Check if a listing is still active (#184)
* check if a listing is still active

* upgrade dependencies
2025-09-22 09:57:50 +02:00
orangecoding
da8fd13973 fixing immoscout 2025-09-19 21:11:28 +02:00
orangecoding
28f0a167e6 fixing docker migration path 2025-09-18 17:28:30 +02:00
Christian Kellner
8d95f052c6 Migrate to SQLite (#174)
* Migrating Fredy from LowDb to SqLite 🎉

* adding new sql migration system for future sql migrations

* adding setting to change  sqlite path for db files

* create migration plan for graceful migration lowdb -> sqlite

* Improving Documentation

* adding test for sqliteconnection

* upgrading dependencies

* making nodejs 22 as min version

* improve scraper

* adding overwrite ability for db migra
2025-09-18 15:38:23 +02:00
orangecoding
09c6ce1d0b improve similarity cache. It now checks for similarities independend from jobs 2025-09-07 22:15:14 +02:00
Christian Kellner
1854b421af avoid warnings on test 2025-09-03 14:47:56 +02:00
Christian Kellner
f0b146fd7f Adding images to scraping data (#157)
* Fredy now supports pulling the main Image from the listing and send it together with the usual information
2025-08-30 21:21:34 +02:00
Nic
3a54ab0e31 Fix typo in test import (#154)
* Fix typo in test import

* Rename test file
2025-08-25 20:42:40 +02:00
Alexander Roidl
2b36f868e7 Project-wide linting and formatting (#150)
* chore: configure project-wide linting and formatting

* chore: run lint autofix and formatter
2025-07-26 20:42:58 +02:00
Alexander Roidl
cca1463a68 chore: run formatter (#145) 2025-07-23 08:47:26 +02:00
Christian Kellner
f032e6a724 test: verify unrelated text yields no similarity (#130) 2025-06-04 09:15:53 +02:00