Compare commits

..

8 Commits

Author SHA1 Message Date
orangecoding
b56e13aa16 upgrading dependencies 2026-06-02 09:26:46 +02:00
Christian Kellner
a834abc31c fixing filtering of lists (#311)
* fixing listing filtering by applying the correct id
2026-06-02 09:24:45 +02:00
Ramin
573868eccb feat(ui): zoom map to saved area when editing a job (#313)
Fit the job edit map to existing polygon areas on init.
2026-06-02 08:54:56 +02:00
orangecoding
1a210d7c1c poi for proxies 2026-05-25 11:54:49 +02:00
orangecoding
996b841cfb adding ability to add proxies for cloak 2026-05-24 20:49:27 +02:00
orangecoding
b2e294e38c next release version 2026-05-21 21:46:13 +02:00
orangecoding
8afeaa05d9 fixing cloakbrowser connection issue 2026-05-21 21:45:57 +02:00
orangecoding
ec47137b89 upgrading dependencies 2026-05-21 21:40:35 +02:00
14 changed files with 1104 additions and 1039 deletions

View File

@@ -167,6 +167,40 @@ For more information on how to set it up and use it, please refer to the [MCP Re
Immoscout has implemented advanced bot detection. In order to work around this, we are using a reversed engineered version of their mobile api. See [Immoscout Reverse Engineering Documentation](https://github.com/orangecoding/fredy/blob/master/reverse-engineered-immoscout.md)
## 🛡️ Bot Detection & Proxies
Most browser-based providers (immowelt, immonet, kleinanzeigen, ...) are scraped through a hardened headless browser ([CloakBrowser](https://www.npmjs.com/package/cloakbrowser)). It makes the **browser fingerprint** indistinguishable from a real Chrome, which is enough when you run Fredy on a normal home connection.
On a **server / VPS the requests usually originate from a datacenter IP**, and providers behind anti-bot systems (e.g. AWS CloudFront/WAF) block those based on **IP reputation alone**, no matter how perfect the fingerprint is. The typical symptom: it works locally but you get `We have been detected as a bot :-/` on the server.
### The fix: a residential proxy
A **residential proxy** routes Fredy's browser through the internet connection of a real household, so the provider sees a "normal user" IP instead of a datacenter. For German portals, use a **German (DE) residential** (or mobile/4G) proxy. Plain VPNs and **datacenter proxies do not help** here, they share the same bad reputation as your server.
**Configure it** under **Settings → Execution → Proxy URL**. Supported formats:
```
http://user:pass@host:port
socks5://user:pass@host:port
```
Leave the field empty to disable. The proxy applies to all headless-browser providers and takes effect on the next job run (no restart needed). Immoscout uses a separate mobile API and is not affected.
### Where to get a residential proxy
Residential proxies are a paid service (usually billed per GB, Fredy's traffic is small). Well-known providers offering German residential IPs include:
| Provider | Notes |
|---|---|
| [IPRoyal](https://iproyal.com) | Pay-as-you-go, no monthly minimum, good for low volume |
| [Webshare](https://www.webshare.io) | Cheap entry tier, has a small free plan to test with |
| [Decodo (formerly Smartproxy)](https://decodo.com) | Easy setup, country/city targeting |
| [SOAX](https://soax.com) | Residential + mobile, fine-grained geo-targeting |
| [Bright Data](https://brightdata.com) | Largest pool, most features, higher complexity/price |
| [Oxylabs](https://oxylabs.io) | Enterprise-grade, larger plans |
This is not an endorsement, pick whatever fits your budget. For low-volume use like Fredy, a pay-as-you-go plan (e.g. IPRoyal) or a cheap entry tier (e.g. Webshare) is usually plenty. Make sure to select **Germany** as the proxy location and keep the search interval reasonable (the higher the interval, the less you look like a bot).
## Analytics
Fredy is completely free (and will always remain free). However, it would be a huge help if youd allow me to collect some analytical data.

View File

@@ -5,9 +5,10 @@
import { NoNewListingsWarning } from './errors.js';
import {
storeListings,
getKnownListingHashesForJobAndProvider,
deleteListingsById,
getKnownListingHashesForJobAndProvider,
storeListings,
updateListingDistance,
} from './services/storage/listingsStorage.js';
import { getJob } from './services/storage/jobStorage.js';
import * as notify from './notification/notify.js';
@@ -16,8 +17,7 @@ import urlModifier from './services/queryStringMutator.js';
import logger from './services/logger.js';
import { geocodeAddress } from './services/geocoding/geoCodingService.js';
import { distanceMeters } from './services/listings/distanceCalculator.js';
import { getUserSettings, getSettings } from './services/storage/settingsStorage.js';
import { updateListingDistance } from './services/storage/listingsStorage.js';
import { getSettings, getUserSettings } from './services/storage/settingsStorage.js';
import booleanPointInPolygon from '@turf/boolean-point-in-polygon';
import { formatListing } from './utils/formatListing.js';
@@ -97,9 +97,9 @@ class FredyPipelineExecutioner {
}
/**
* Optionally enrich new listings with data from their detail pages.
* Optionally, enrich new listings with data from their detail pages.
* Only called when the provider config defines a `fetchDetails` function.
* Runs all fetches in parallel. Each individual fetch must handle its own errors
* Runs all fetches in parallel. Each fetch must handle its own errors
* and always resolve (never reject) to avoid aborting other listings.
*
* @param {Listing[]} newListings New listings to enrich.
@@ -132,7 +132,7 @@ class FredyPipelineExecutioner {
for (const listing of newListings) {
if (listing.address) {
const coords = await geocodeAddress(listing.address);
if (coords) {
if (coords && coords.lat !== -1 && coords.lng !== -1) {
listing.latitude = coords.lat;
listing.longitude = coords.lng;
}
@@ -264,15 +264,15 @@ class FredyPipelineExecutioner {
const requiredKeys = this._providerConfig.requiredFieldNames;
const requireValues = ['id', 'link', 'title'];
const filteredListings = listings
// this should never filter some listings out, because the normalize function should always extract all fields.
.filter((item) => requiredKeys.every((key) => key in item))
// TODO: move blacklist filter to this file, so it will handle for all providers in same way.
.filter(this._providerConfig.filter)
// filter out listings that are missing required fields
.filter((item) => requireValues.every((key) => item[key] != null));
return filteredListings;
return (
listings
// this should never filter some listings out, because the normalize function should always extract all fields.
.filter((item) => requiredKeys.every((key) => key in item))
// TODO: move blacklist filter to this file, so it will handle for all providers in same way.
.filter(this._providerConfig.filter)
// filter out listings that are missing required fields
.filter((item) => requireValues.every((key) => item[key] != null))
);
}
/**

View File

@@ -10,5 +10,6 @@ export const TRACKING_POIS = {
JOBS_TABLE_VIEW: 'JOBS_TABLE_VIEW',
LISTING_TABLE_VIEW: 'LISTING_TABLE_VIEW',
BASE_URL_SETTING: 'BASE_URL_SETTING',
SET_PROXY_SETTING: 'SET_PROXY_SETTING',
DETECTED_AS_BOT: 'DETECTED_AS_BOT',
};

View File

@@ -44,6 +44,9 @@ export default async function generalSettingsPlugin(fastify) {
if (appSettings.baseUrl != null) {
await trackPoi(TRACKING_POIS.BASE_URL_SETTING);
}
if (appSettings.proxyUrl != null) {
await trackPoi(TRACKING_POIS.SET_PROXY_SETTING);
}
} catch (err) {
logger.error(err);
return reply.code(500).send({ error: 'Error while trying to write settings.' });

View File

@@ -4,7 +4,7 @@
*/
import { launch } from 'cloakbrowser/puppeteer';
import { debug, botDetected } from './utils.js';
import { botDetected, debug } from './utils.js';
import { getPreLaunchConfig } from './botPrevention.js';
import logger from '../logger.js';
import { trackPoi } from '../tracking/Tracker.js';
@@ -50,7 +50,7 @@ export async function launchBrowser(url, options) {
preCfg.windowSizeArg,
];
const browser = await launch({
return await launch({
headless: options?.puppeteerHeadless ?? true,
humanize: true,
args,
@@ -59,8 +59,6 @@ export async function launchBrowser(url, options) {
...(options?.proxyUrl ? { proxy: options.proxyUrl } : {}),
...(preCfg.timezone ? { timezone: preCfg.timezone } : {}),
});
return browser;
}
/**

View File

@@ -14,6 +14,7 @@ import * as similarityCache from '../similarity-check/similarityCache.js';
import { isRunning, markFinished, markRunning } from './run-state.js';
import { sendToUsers } from '../sse/sse-broker.js';
import * as puppeteerExtractor from '../extractor/puppeteerExtractor.js';
import { getSettings } from '../storage/settingsStorage.js';
/**
* Initializes the job execution service.
@@ -160,6 +161,14 @@ export function initJobExecutionService({ providers, settings, intervalMs }) {
}
let browser;
try {
// Read the proxy live (not from the startup snapshot) so changing it in the
// UI takes effect on the next run without a backend restart. An empty value
// disables the proxy. Routing the headless browser through a (German
// residential) proxy avoids datacenter-IP based bot detection on the
// Puppeteer-based providers (immowelt, immonet, kleinanzeigen, ...).
const liveSettings = await getSettings();
const proxyUrl = typeof liveSettings?.proxyUrl === 'string' ? liveSettings.proxyUrl.trim() : '';
const jobProviders = job.provider.filter(
(p) => providers.find((loaded) => loaded.metaInformation.id === p.id) != null,
);
@@ -168,14 +177,14 @@ export function initJobExecutionService({ providers, settings, intervalMs }) {
const matchedProvider = providers.find((loaded) => loaded.metaInformation.id === prov.id);
matchedProvider.init({ ...prov, userId: job.userId }, job.blacklist);
if (browser && !browser.isConnected()) {
if (browser && !browser.connected) {
logger.debug('Browser is disconnected, nullifying to launch a new one.');
await puppeteerExtractor.closeBrowser(browser);
browser = null;
}
if (!browser && matchedProvider.config.getListings == null) {
browser = await puppeteerExtractor.launchBrowser(matchedProvider.config.url, {});
browser = await puppeteerExtractor.launchBrowser(matchedProvider.config.url, proxyUrl ? { proxyUrl } : {});
}
await new FredyPipelineExecutioner(matchedProvider.config, job, prov.id, similarityCache, browser).execute();

View File

@@ -214,6 +214,8 @@ export const storeListings = (jobId, providerId, listings) => {
longitude: item.longitude || null,
};
stmt.run(params);
// Propagate the DB primary key back so downstream pipeline steps use the correct id
item.id = params.id;
}
});
@@ -417,9 +419,10 @@ export const deleteListingsByJobId = (jobId, hardDelete = false) => {
};
/**
* Delete listings by a list of listing IDs.
* Delete listings by a list of listing IDs (the nanoid primary key stored in the `id` column).
* Used by API routes that receive row IDs from the client.
*
* @param {string[]} ids - Array of listing IDs to delete.
* @param {string[]} ids - Array of DB row IDs to delete.
* @param {boolean} [hardDelete=false] - Whether to hard delete from DB or just mark as deleted.
* @returns {any} The result from SqliteConnection.execute.
*/

View File

@@ -1,6 +1,6 @@
{
"name": "fredy",
"version": "22.0.8",
"version": "22.2.0",
"description": "[F]ind [R]eal [E]states [d]amn eas[y].",
"scripts": {
"prepare": "husky",
@@ -62,9 +62,9 @@
"Firefox ESR"
],
"dependencies": {
"@douyinfe/semi-icons": "^2.97.0",
"@douyinfe/semi-ui": "2.97.0",
"@douyinfe/semi-ui-19": "^2.97.0",
"@douyinfe/semi-icons": "^2.99.3",
"@douyinfe/semi-ui": "2.99.3",
"@douyinfe/semi-ui-19": "^2.99.3",
"@fastify/cookie": "^11.0.2",
"@fastify/helmet": "^13.0.2",
"@fastify/session": "^11.1.1",
@@ -78,7 +78,7 @@
"better-sqlite3": "^12.10.0",
"chart.js": "^4.5.1",
"cheerio": "^1.2.0",
"cloakbrowser": "^0.3.28",
"cloakbrowser": "^0.3.31",
"fastify": "^5.8.5",
"handlebars": "4.7.9",
"maplibre-gl": "^5.24.0",
@@ -86,41 +86,41 @@
"node-cron": "^4.2.1",
"node-fetch": "3.3.2",
"node-mailjet": "6.0.11",
"nodemailer": "^8.0.7",
"nodemailer": "^8.0.10",
"p-throttle": "^8.1.0",
"package-up": "^5.0.0",
"puppeteer-core": "^25.0.4",
"query-string": "9.3.1",
"react": "19.2.6",
"puppeteer-core": "^25.1.0",
"query-string": "9.4.0",
"react": "19.2.7",
"react-chartjs-2": "^5.3.1",
"react-dom": "19.2.6",
"react-dom": "19.2.7",
"react-range-slider-input": "^3.3.5",
"react-router": "7.15.1",
"react-router-dom": "7.15.1",
"resend": "^6.12.3",
"semver": "^7.8.0",
"react-router": "7.16.0",
"react-router-dom": "7.16.0",
"resend": "^6.12.4",
"semver": "^7.8.1",
"slack": "11.0.2",
"vite": "8.0.13",
"vite": "8.0.16",
"x-var": "^3.0.1",
"zustand": "^5.0.13"
"zustand": "^5.0.14"
},
"devDependencies": {
"@babel/core": "7.29.0",
"@babel/eslint-parser": "7.28.6",
"@babel/preset-env": "7.29.5",
"@babel/preset-react": "7.28.5",
"@babel/core": "7.29.7",
"@babel/eslint-parser": "7.29.7",
"@babel/preset-env": "7.29.7",
"@babel/preset-react": "7.29.7",
"@eslint/js": "^10.0.1",
"chalk": "^5.6.2",
"eslint": "10.4.0",
"eslint": "10.4.1",
"eslint-config-prettier": "10.1.8",
"eslint-plugin-react": "7.37.5",
"globals": "^17.6.0",
"history": "5.3.0",
"husky": "9.1.7",
"less": "4.6.4",
"lint-staged": "17.0.5",
"lint-staged": "17.0.7",
"nodemon": "^3.1.14",
"prettier": "3.8.3",
"vitest": "^4.1.6"
"vitest": "^4.1.8"
}
}

View File

@@ -32,4 +32,7 @@ export const deletedIds = [];
export const deleteListingsById = (ids) => {
deletedIds.push(...ids);
};
export const deleteListingsByHash = (hashes) => {
deletedIds.push(...hashes);
};
/* eslint-enable no-unused-vars */

View File

@@ -0,0 +1,37 @@
/*
* Copyright (c) 2026 by Christian Kellner.
* Licensed under Apache-2.0 with Commons Clause and Attribution/Naming Clause
*/
import { vi, describe, it, expect, beforeEach } from 'vitest';
// Mock the CloakBrowser launcher so no real Chromium binary is needed and we can
// assert which options get forwarded to it.
const { launchMock } = vi.hoisted(() => ({ launchMock: vi.fn() }));
vi.mock('cloakbrowser/puppeteer', () => ({
launch: launchMock,
}));
const { launchBrowser } = await import('../../../lib/services/extractor/puppeteerExtractor.js');
describe('launchBrowser proxy forwarding', () => {
beforeEach(() => {
launchMock.mockReset();
launchMock.mockResolvedValue({ close: async () => {} });
});
it('forwards proxyUrl to CloakBrowser as the proxy option', async () => {
await launchBrowser('https://www.immowelt.de/', { proxyUrl: 'http://user:pass@host:8080' });
expect(launchMock).toHaveBeenCalledTimes(1);
expect(launchMock.mock.calls[0][0]).toMatchObject({ proxy: 'http://user:pass@host:8080' });
});
it('does not set a proxy when no proxyUrl is given', async () => {
await launchBrowser('https://www.immowelt.de/', {});
expect(launchMock).toHaveBeenCalledTimes(1);
expect(launchMock.mock.calls[0][0].proxy).toBeUndefined();
});
});

View File

@@ -18,6 +18,7 @@ describe('services/jobs/jobExecutionService', () => {
const busPath = root + '/lib/services/events/event-bus.js';
const jobStoragePath = root + '/lib/services/storage/jobStorage.js';
const userStoragePath = root + '/lib/services/storage/userStorage.js';
const settingsStoragePath = root + '/lib/services/storage/settingsStorage.js';
const brokerPath = root + '/lib/services/sse/sse-broker.js';
const utilsPath = root + '/lib/utils.js';
const loggerPath = root + '/lib/services/logger.js';
@@ -33,6 +34,9 @@ describe('services/jobs/jobExecutionService', () => {
getUsers: () => state.users.slice(),
getUser: (id) => state.users.find((u) => u.id === id) || null,
}));
vi.doMock(settingsStoragePath, () => ({
getSettings: async () => ({}),
}));
vi.doMock(brokerPath, () => ({
sendToUsers: (...args) => calls.sent.push(args),
}));

View File

@@ -8,6 +8,7 @@ import maplibregl from 'maplibre-gl';
import 'maplibre-gl/dist/maplibre-gl.css';
import '@mapbox/mapbox-gl-draw/dist/mapbox-gl-draw.css';
import { fixMapboxDrawCompatibility, addDrawingControl, setupAreaFilterEventListeners } from './MapDrawingExtension.js';
import { getBoundsFromCoords } from '../../views/listings/mapUtils.js';
import './Map.less';
export const GERMANY_BOUNDS = [
@@ -66,6 +67,7 @@ export default function Map({
const mapContainerRef = useRef(null);
const mapRef = useRef(null);
const drawRef = useRef(null);
const hasFittedToInitialAreaRef = useRef(false);
// Initialize map - ONLY when container changes, never reinitialize
useEffect(() => {
@@ -128,6 +130,17 @@ export default function Map({
} catch (error) {
console.error('Error loading spatial filter:', error);
}
if (!hasFittedToInitialAreaRef.current) {
const coords = initialSpatialFilter.features.flatMap((feature) =>
feature.geometry?.type === 'Polygon' ? feature.geometry.coordinates.flat() : [],
);
const bounds = getBoundsFromCoords(coords);
if (bounds) {
mapRef.current.fitBounds(bounds, { padding: 50, maxZoom: 15, duration: 0 });
hasFittedToInitialAreaRef.current = true;
}
}
}
// Setup drawing event listeners

View File

@@ -57,6 +57,7 @@ const GeneralSettings = function GeneralSettings() {
const currentUser = useSelector((state) => state.user.currentUser);
const [interval, setInterval] = React.useState('');
const [proxyUrl, setProxyUrl] = React.useState('');
const [port, setPort] = React.useState('');
const [workingHourFrom, setWorkingHourFrom] = React.useState(null);
const [workingHourTo, setWorkingHourTo] = React.useState(null);
@@ -91,6 +92,7 @@ const GeneralSettings = function GeneralSettings() {
React.useEffect(() => {
async function init() {
setInterval(settings?.interval);
setProxyUrl(settings?.proxyUrl ?? '');
setPort(settings?.port);
setWorkingHourFrom(settings?.workingHours?.from);
setWorkingHourTo(settings?.workingHours?.to);
@@ -133,6 +135,7 @@ const GeneralSettings = function GeneralSettings() {
try {
await xhrPost('/api/admin/generalSettings', {
interval,
proxyUrl: proxyUrl?.trim() ?? '',
port,
workingHours: {
from: workingHourFrom,
@@ -376,6 +379,18 @@ const GeneralSettings = function GeneralSettings() {
</div>
</SegmentPart>
<SegmentPart
name="Proxy URL"
helpText="Optional. Routes the scraping browser through a proxy. Server/datacenter IPs are frequently blocked by providers (e.g. immowelt) regardless of browser fingerprint, a German residential proxy makes requests look like a normal household and is the most effective fix. Format: http://user:pass@host:port or socks5://user:pass@host:port. Leave empty to disable."
>
<Input
type="text"
placeholder="http://user:pass@host:port"
value={proxyUrl}
onChange={(value) => setProxyUrl(value)}
/>
</SegmentPart>
<div className="generalSettings__save-row">
<Button type="primary" theme="solid" onClick={handleStore} icon={<IconSave />}>
Save

1929
yarn.lock

File diff suppressed because it is too large Load Diff