Compare commits

...

17 Commits

Author SHA1 Message Date
Christian Kellner
1ecbbdd774 better logging 2025-01-07 13:34:43 +01:00
Christian Kellner
e1db3840f6 adding puppeteer timeout and fixing waitForSelector 2025-01-07 12:37:50 +01:00
Christian Kellner
26127eeac1 updating dependencies 2025-01-07 12:27:16 +01:00
Christian Kellner
90a4ee5dcf better logging, fixing code smells 2025-01-07 12:25:19 +01:00
Christian Kellner
2aaf63c253 Happy New Year 2025-01-05 06:53:07 +01:00
Christian Kellner
f52e3e9fd8 Update package.json 2025-01-04 21:52:06 +01:00
Fabian Pfaff
0d69232395 install chrome via apt instead of bundled (#122) 2025-01-04 21:50:59 +01:00
weakmap@gmail.com
b473cf7fb4 fixing kleinanzeigen test 2024-12-26 19:18:30 +01:00
weakmap@gmail.com
3b8279c714 adding fredy version 2024-12-17 13:07:25 +01:00
Christian Kellner
214e714c03 Puppeteer rewrite (#119)
* Moving to puppeteer | removing scrapingAnt
2024-12-17 12:38:28 +01:00
Christian Kellner
58965a6f1b Running tests at least once a day 2024-12-16 14:06:34 +01:00
weakmap@gmail.com
3c0e9e56c6 fixing immowelt 2024-12-10 09:08:25 +01:00
Christian Kellner
f5d56a6bda version update 2024-12-03 14:25:02 +01:00
Christian Kellner
324b14da50 improving tracking 2024-12-03 14:23:09 +01:00
Christian Kellner
f8f911aa00 improving tracking 2024-12-03 14:05:00 +01:00
Christian Kellner
13b8701447 Update CONTRIBUTING.md 2024-12-02 15:02:36 +01:00
Christian Kellner
e25b956eda Update config.json 2024-11-22 12:32:37 +01:00
36 changed files with 2048 additions and 1262 deletions

View File

@@ -6,6 +6,8 @@ on:
pull_request: pull_request:
branches: branches:
- master - master
schedule:
- cron: '0 12 * * *'
jobs: jobs:
test: test:
name: Test name: Test

View File

@@ -106,9 +106,7 @@ exports.config = {
``` ```
#### Running Tests #### Running Tests
If you've written a new provider you are an awesome person. You know it and I do. If you now write tests for it, you are even more awesome. And who doesn't want to be more awesome right? If you've written a new provider you are an awesome person. If you now write tests for it, you are even more awesome. And who doesn't want to be more awesome right?
To write tests for provider, you need to use Node 8 as the tests are using `async / await`
#### Codestyle #### Codestyle
I'm using Eslint to maintain quote style and quality. Do not skip it... I'm using Eslint to maintain quote style and quality. Do not skip it...

View File

@@ -4,6 +4,11 @@ WORKDIR /fredy
COPY . /fredy COPY . /fredy
RUN apt-get update && apt-get install -y chromium
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \
PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium
RUN yarn install RUN yarn install
RUN yarn global add pm2 RUN yarn global add pm2

View File

@@ -1,6 +1,6 @@
MIT License MIT License
Copyright (c) 2024 Christian Kellner Copyright (c) 2025 Christian Kellner
Permission is hereby granted, free of charge, to any person obtaining a copy Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal of this software and associated documentation files (the "Software"), to deal

View File

@@ -81,13 +81,8 @@ yarn run test
# Architecture # Architecture
![Architecture](/doc/architecture.jpg "Architecture") ![Architecture](/doc/architecture.jpg "Architecture")
### Immoscout / Immonet / NeubauKompass ### Immoscout
I have added **experimental** support for Immoscout, Immonet and NeubauKompass. They all are somewhat special, because they have decided to secure their service from bots using Re-Capture. Finding a way around this is barely possible. For _Fredy_ to be able to bypass this check, I'm using a service called [ScrapingAnt](https://scrapingant.com/). The trick is to use a headless browser, rotating proxies and (once successfully validated) to re-send the cookies each time. Immoscout has implemented advanced bot detection. Im actively working on bypassing these measures, but until then, selecting Immoscout as a provider will not return any results. I apologize for the inconvenience. 😉
To be able to use Immoscout / Immonet, you need to create an account at ScrapingAnt. Configure the API key in the "General Settings" tab (visible when logged in as administrator).
The rest will be handled by _Fredy_. Keep in mind, the support is experimental. There might be bugs and you might not always pass the re-capture check, but most of the time it works rather well :)
If you need more than the 1000 API calls allowed per month, I'd suggest opting for a paid account... ScrapingAnt loves OpenSource, therefore they have decided to give all _Fredy_ users a 10% discount by using the code **FREDY10** (Disclaimer: I do not earn any money for recommending their service).
# Analytics # Analytics
Fredy is completely free (and will always remain free). However, it would be a huge help if youd allow me to collect some analytical data. Fredy is completely free (and will always remain free). However, it would be a huge help if youd allow me to collect some analytical data.

View File

@@ -1 +1 @@
{"interval":"60","port":9998,"scrapingAnt":{"apiKey":"d","proxy":"datacenter"},"workingHours":{"from":"","to":""},"demoMode":false,"analyticsEnabled":null} {"interval":"60","port":9998,"workingHours":{"from":"","to":""},"demoMode":false,"analyticsEnabled":null}

View File

@@ -1,9 +1,9 @@
import { NoNewListingsWarning } from './errors.js'; import { NoNewListingsWarning } from './errors.js';
import { setKnownListings, getKnownListings } from './services/storage/listingsStorage.js'; import { setKnownListings, getKnownListings } from './services/storage/listingsStorage.js';
import * as notify from './notification/notify.js'; import * as notify from './notification/notify.js';
import xray from './services/scraper.js'; import Extractor from './services/extractor/extractor.js';
import * as scrapingAnt from './services/scrapingAnt.js';
import urlModifier from './services/queryStringMutator.js'; import urlModifier from './services/queryStringMutator.js';
class FredyRuntime { class FredyRuntime {
/** /**
* *
@@ -20,6 +20,7 @@ class FredyRuntime {
this._jobKey = jobKey; this._jobKey = jobKey;
this._similarityCache = similarityCache; this._similarityCache = similarityCache;
} }
execute() { execute() {
return ( return (
//modify the url to make sure search order is correctly set //modify the url to make sure search order is correctly set
@@ -42,56 +43,40 @@ class FredyRuntime {
.catch(this._handleError.bind(this)) .catch(this._handleError.bind(this))
); );
} }
_getListings(url) { _getListings(url) {
const extractor = new Extractor();
return new Promise((resolve, reject) => { return new Promise((resolve, reject) => {
const id = this._providerId; extractor
if (scrapingAnt.needScrapingAnt(id) && !scrapingAnt.isScrapingAntApiKeySet()) { .execute(url, this._providerConfig.waitForSelector)
const error = 'Immoscout or Immonet can only be used with if you have set an apikey for scrapingAnt.'; .then(() => {
/* eslint-disable no-console */ const listings = extractor.parseResponseText(
console.log(error); this._providerConfig.crawlContainer,
/* eslint-enable no-console */ this._providerConfig.crawlFields,
reject(error); url,
return; );
} resolve(listings == null ? [] : listings);
const u = scrapingAnt.needScrapingAnt(id) ? scrapingAnt.transformUrlForScrapingAnt(url, id) : url; })
try { .catch((err) => {
if (this._providerConfig.paginate != null) { reject(err);
xray(u, this._providerConfig.crawlContainer, [this._providerConfig.crawlFields]) /* eslint-disable no-console */
//the first 2 pages should be enough here console.error(err);
.limit(2) /* eslint-enable no-console */
.paginate(this._providerConfig.paginate) });
.then((listings) => {
resolve(listings == null ? [] : listings);
})
.catch((err) => {
reject(err);
console.error(err);
});
} else {
xray(u, this._providerConfig.crawlContainer, [this._providerConfig.crawlFields])
.then((listings) => {
resolve(listings == null ? [] : listings);
})
.catch((err) => {
reject(err);
console.error(err);
});
}
} catch (error) {
reject(error);
console.error(error);
}
}); });
} }
_normalize(listings) { _normalize(listings) {
return listings.map(this._providerConfig.normalize); return listings.map(this._providerConfig.normalize);
} }
_filter(listings) { _filter(listings) {
//only return those where all the fields have been found //only return those where all the fields have been found
const keys = Object.keys(this._providerConfig.crawlFields); const keys = Object.keys(this._providerConfig.crawlFields);
const filteredListings = listings.filter((item) => keys.every((key) => key in item)); const filteredListings = listings.filter((item) => keys.every((key) => key in item));
return filteredListings.filter(this._providerConfig.filter); return filteredListings.filter(this._providerConfig.filter);
} }
_findNew(listings) { _findNew(listings) {
const newListings = listings.filter((o) => getKnownListings(this._jobKey, this._providerId)[o.id] == null); const newListings = listings.filter((o) => getKnownListings(this._jobKey, this._providerId)[o.id] == null);
if (newListings.length === 0) { if (newListings.length === 0) {
@@ -99,6 +84,7 @@ class FredyRuntime {
} }
return newListings; return newListings;
} }
_notify(newListings) { _notify(newListings) {
if (newListings.length === 0) { if (newListings.length === 0) {
throw new NoNewListingsWarning(); throw new NoNewListingsWarning();
@@ -106,6 +92,7 @@ class FredyRuntime {
const sendNotifications = notify.send(this._providerId, newListings, this._notificationConfig, this._jobKey); const sendNotifications = notify.send(this._providerId, newListings, this._notificationConfig, this._jobKey);
return Promise.all(sendNotifications).then(() => newListings); return Promise.all(sendNotifications).then(() => newListings);
} }
_save(newListings) { _save(newListings) {
const currentListings = getKnownListings(this._jobKey, this._providerId) || {}; const currentListings = getKnownListings(this._jobKey, this._providerId) || {};
newListings.forEach((listing) => { newListings.forEach((listing) => {
@@ -114,6 +101,7 @@ class FredyRuntime {
setKnownListings(this._jobKey, this._providerId, currentListings); setKnownListings(this._jobKey, this._providerId, currentListings);
return newListings; return newListings;
} }
_filterBySimilarListings(listings) { _filterBySimilarListings(listings) {
const filteredList = listings.filter((listing) => { const filteredList = listings.filter((listing) => {
const similar = this._similarityCache.hasSimilarEntries(this._jobKey, listing.title); const similar = this._similarityCache.hasSimilarEntries(this._jobKey, listing.title);
@@ -127,8 +115,10 @@ class FredyRuntime {
filteredList.forEach((filter) => this._similarityCache.addCacheEntry(this._jobKey, filter.title)); filteredList.forEach((filter) => this._similarityCache.addCacheEntry(this._jobKey, filter.title));
return filteredList; return filteredList;
} }
_handleError(err) { _handleError(err) {
if (err.name !== 'NoNewListingsWarning') console.error(err); if (err.name !== 'NoNewListingsWarning') console.error(err);
} }
} }
export default FredyRuntime; export default FredyRuntime;

View File

@@ -1,12 +1,9 @@
import restana from 'restana'; import restana from 'restana';
import fetch from 'node-fetch';
import * as jobStorage from '../../services/storage/jobStorage.js'; import * as jobStorage from '../../services/storage/jobStorage.js';
import * as userStorage from '../../services/storage/userStorage.js'; import * as userStorage from '../../services/storage/userStorage.js';
import * as immoscoutProvider from '../../provider/immoscout.js';
import { config } from '../../utils.js'; import { config } from '../../utils.js';
import { isAdmin } from '../security.js'; import { isAdmin } from '../security.js';
import {isScrapingAntApiKeySet} from '../../services/scrapingAnt.js'; import { trackDemoJobCreated } from '../../services/tracking/Tracker.js';
import {trackDemoJobCreated} from '../../services/tracking/Tracker.js';
const service = restana(); const service = restana();
const jobRouter = service.newRouter(); const jobRouter = service.newRouter();
function doesJobBelongsToUser(job, req) { function doesJobBelongsToUser(job, req) {
@@ -27,34 +24,14 @@ jobRouter.get('/', async (req, res) => {
res.send(); res.send();
}); });
jobRouter.get('/processingTimes', async (req, res) => { jobRouter.get('/processingTimes', async (req, res) => {
let scrapingAntData = {};
if (isScrapingAntApiKeySet()) {
try {
const response = await fetch(`https://api.scrapingant.com/v2/usage?x-api-key=${config.scrapingAnt.apiKey}`);
scrapingAntData = await response.json();
} catch (Exception) {
console.error('Could not query plan data from scraping ant.', Exception);
}
}
res.body = { res.body = {
interval: config.interval, interval: config.interval,
lastRun: config.lastRun || null, lastRun: config.lastRun || null,
scrapingAntData,
error: scrapingAntData?.detail == null ? null : scrapingAntData?.detail
}; };
res.send(); res.send();
}); });
jobRouter.post('/', async (req, res) => { jobRouter.post('/', async (req, res) => {
const { provider, notificationAdapter, name, blacklist = [], jobId, enabled } = req.body; const { provider, notificationAdapter, name, blacklist = [], jobId, enabled } = req.body;
if (
provider.find((p) => p.id === immoscoutProvider.metaInformation.id) != null &&
(config.scrapingAnt.apiKey == null || config.scrapingAnt.apiKey.length === 0)
) {
res.send(
new Error('To use Immoscout as provider, you need to configure ScrapingAnt first. Please check the readme.')
);
return;
}
try { try {
jobStorage.upsertJob({ jobStorage.upsertJob({
userId: req.session.currentUser, userId: req.session.currentUser,
@@ -72,7 +49,7 @@ jobRouter.post('/', async (req, res) => {
trackDemoJobCreated({ trackDemoJobCreated({
name, name,
provider, provider,
adapter: notificationAdapter adapter: notificationAdapter,
}); });
res.send(); res.send();
}); });

View File

@@ -1,7 +1,6 @@
export const DEFAULT_CONFIG = { export const DEFAULT_CONFIG = {
'interval': '60', 'interval': '60',
'port': 9998, 'port': 9998,
'scrapingAnt': {'apiKey': '', 'proxy': 'datacenter'},
'workingHours': {'from': '', 'to': ''}, 'workingHours': {'from': '', 'to': ''},
'demoMode': false, 'demoMode': false,
'analyticsEnabled': null 'analyticsEnabled': null

View File

@@ -2,14 +2,10 @@ import utils, { buildHash } from '../utils.js';
let appliedBlackList = []; let appliedBlackList = [];
function normalize(o) { function normalize(o) {
let size = `${o.size.replace(' Wohnfläche ', '').trim()}`;
if (o.rooms != null) {
size += ` / / ${o.rooms.trim()}`;
}
const link = `https://www.1a-immobilienmarkt.de/expose/${o.id}.html`; const link = `https://www.1a-immobilienmarkt.de/expose/${o.id}.html`;
const price = normalizePrice(o.price); const price = normalizePrice(o.price);
const id = buildHash(o.id, price); const id = buildHash(o.id, price);
return Object.assign(o, { id, price, size, link }); return Object.assign(o, { id, price, link });
} }
/** /**
@@ -39,12 +35,12 @@ const config = {
url: null, url: null,
crawlContainer: '.tabelle', crawlContainer: '.tabelle',
sortByDateParam: 'sort_type=newest', sortByDateParam: 'sort_type=newest',
waitForSelector: 'body',
crawlFields: { crawlFields: {
id: '.inner_object_data input[name="marker_objekt_id"]@value | int', id: '.inner_object_data input[name="marker_objekt_id"]@value | int',
price: '.tabelle .inner_object_data .single_data_price | removeNewline | trim', price: '.inner_object_data .single_data_price | removeNewline | trim',
size: '.tabelle .inner_object_data .data_boxes div:nth-child(1)', size: '.tabelle .tabelle_inhalt_infos .single_data_box | removeNewline | trim',
rooms: '.tabelle .inner_object_data .data_boxes div:nth-child(2)', title: '.inner_object_data .tabelle_inhalt_titel_black | removeNewline | trim',
title: '.tabelle .inner_object_data .tabelle_inhalt_titel_black | removeNewline | trim',
}, },
normalize: normalize, normalize: normalize,
filter: applyBlacklist, filter: applyBlacklist,

View File

@@ -11,8 +11,9 @@ function normalize(o) {
const price = o.price || 'N/A €'; const price = o.price || 'N/A €';
const title = o.title || 'No title available'; const title = o.title || 'No title available';
const address = o.address || 'No address available'; const address = o.address || 'No address available';
const link = shortenLink(o.link); const shortLink = shortenLink(o.link);
const id = buildHash(parseId(shortenLink(o.link)), o.price); const link = `https://www.immobilien.de/${shortLink}`;
const id = buildHash(parseId(shortLink), o.price);
return Object.assign(o, { id, price, size, title, address, link }); return Object.assign(o, { id, price, size, title, address, link });
} }
function applyBlacklist(o) { function applyBlacklist(o) {
@@ -22,9 +23,11 @@ function applyBlacklist(o) {
} }
const config = { const config = {
url: null, url: null,
crawlContainer: '.estates_list .list_immo a._ref', crawlContainer: '._ref',
sortByDateParam: 'sort_col=*created_ts&sort_dir=desc', sortByDateParam: 'sort_col=*created_ts&sort_dir=desc',
waitForSelector: 'body',
crawlFields: { crawlFields: {
id: '@href', //will be transformed later
price: '.list_entry .immo_preis .label_info', price: '.list_entry .immo_preis .label_info',
size: '.list_entry .flaeche .label_info | removeNewline | trim', size: '.list_entry .flaeche .label_info | removeNewline | trim',
title: '.list_entry .part_text h3 span', title: '.list_entry .part_text h3 span',
@@ -32,7 +35,6 @@ const config = {
link: '@href', link: '@href',
address: '.list_entry .place', address: '.list_entry .place',
}, },
paginate: '.list_immo .blocknav .blocknav_list li.next a@href',
normalize: normalize, normalize: normalize,
filter: applyBlacklist, filter: applyBlacklist,
}; };

View File

@@ -1,12 +1,20 @@
import utils, {buildHash} from '../utils.js'; import utils, { buildHash } from '../utils.js';
let appliedBlackList = []; let appliedBlackList = [];
/**
* Note, Immonet is rly a piece of sh*t. It is using a weird combination of React and some buttons (instead of links),
* so that if somebody clicks the listing, a new page will open with the actual link to the listing. Of course, a scraper
* cannot do this (which is why I always just return the link to the whole list of listings).
* This is not only bad for us, but also bad for ppl with disabilities...
*/
function normalize(o) { function normalize(o) {
const size = o.size != null ? o.size.replace('Wohnfläche ', '') : 'N/A m²'; const size = o.size != null ? o.size.replace('Wohnfläche ', '') : 'N/A m²';
const price = o.price.replace('Kaufpreis ', ''); const price = o.price.replace('Kaufpreis ', '');
const address = o.address.split(' • ')[o.address.split(' • ').length - 1]; const address = o.address.split(' • ')[o.address.split(' • ').length - 1];
const title = o.title || 'No title available'; const title = o.title || 'No title available';
const link = o.id; const link = config.url;
const id = buildHash(o.id.substring(o.id.lastIndexOf('/') + 1, o.id.length), price); const id = buildHash(title, price);
return Object.assign(o, { id, address, price, size, title, link }); return Object.assign(o, { id, address, price, size, title, link });
} }
function applyBlacklist(o) { function applyBlacklist(o) {
@@ -16,16 +24,16 @@ function applyBlacklist(o) {
} }
const config = { const config = {
url: null, url: null,
crawlContainer: '.content-wrapper-tiles .ng-star-inserted', crawlContainer: 'div[data-testid="serp-core-classified-card-testid"]',
sortByDateParam: 'sortby=19', sortByDateParam: 'sortby=19',
waitForSelector: 'div[data-testid="serp-resultscount-testid"]',
crawlFields: { crawlFields: {
id: '.card a@href', id: 'button@title |trim', // immonet is a piece of sh*t. See comment above
title: '.card h3 |trim', title: 'button@title |trim',
price: '.card .has-font-300 .is-bold | trim', price: 'div[data-testid="cardmfe-price-testid"] | trim',
size: '.card .has-font-300 .ml-100 | trim', size: 'div[data-testid="cardmfe-keyfacts-testid"] | trim',
address: '.card span:nth-child(2) | trim', address: 'div[data-testid="cardmfe-description-box-address"] | trim',
}, },
paginate: '#idResultList .margin-bottom-6.margin-bottom-sm-12 .panel a.pull-right@href',
normalize: normalize, normalize: normalize,
filter: applyBlacklist, filter: applyBlacklist,
}; };

View File

@@ -17,6 +17,7 @@ const config = {
url: null, url: null,
crawlContainer: '#resultListItems li.result-list__listing', crawlContainer: '#resultListItems li.result-list__listing',
sortByDateParam: 'sorting=2', sortByDateParam: 'sorting=2',
waitForSelector: 'body',
crawlFields: { crawlFields: {
id: '.result-list-entry@data-obid | int', id: '.result-list-entry@data-obid | int',
price: '.result-list-entry .result-list-entry__criteria .grid-item:first-child dd | removeNewline | trim', price: '.result-list-entry .result-list-entry__criteria .grid-item:first-child dd | removeNewline | trim',
@@ -25,7 +26,6 @@ const config = {
link: '.result-list-entry .result-list-entry__brand-title-container@href', link: '.result-list-entry .result-list-entry__brand-title-container@href',
address: '.result-list-entry .result-list-entry__map-link', address: '.result-list-entry .result-list-entry__map-link',
}, },
paginate: '#pager .align-right a@href',
normalize: normalize, normalize: normalize,
filter: applyBlacklist, filter: applyBlacklist,
}; };

View File

@@ -23,6 +23,7 @@ const config = {
url: null, url: null,
crawlContainer: '.js-serp-item', crawlContainer: '.js-serp-item',
sortByDateParam: 's=most_recently_updated_first', sortByDateParam: 's=most_recently_updated_first',
waitForSelector: 'body',
crawlFields: { crawlFields: {
id: '.js-bookmark-btn@data-id', id: '.js-bookmark-btn@data-id',
price: 'div.align-items-start div:first-child | trim', price: 'div.align-items-start div:first-child | trim',
@@ -31,7 +32,6 @@ const config = {
link: '.ci-search-result__link@href', link: '.ci-search-result__link@href',
description: '.js-show-more-item-sm | removeNewline | trim', description: '.js-show-more-item-sm | removeNewline | trim',
}, },
paginate: 'li.page-item.pagination__item a.page-link@href',
normalize: normalize, normalize: normalize,
filter: applyBlacklist, filter: applyBlacklist,
}; };

View File

@@ -16,17 +16,17 @@ function applyBlacklist(o) {
const config = { const config = {
url: null, url: null,
crawlContainer: crawlContainer:
'div[data-testid="serp-card-testid"]:not(div[data-testid="serp-enlargementlist-testid"] div[data-testid="serp-card-testid"])', 'div[data-testid="serp-core-scrollablelistview-testid"]:not(div[data-testid="serp-enlargementlist-testid"] div[data-testid="serp-card-testid"]) div[data-testid="serp-core-classified-card-testid"]',
sortByDateParam: 'order=DateDesc', sortByDateParam: 'order=DateDesc',
waitForSelector: 'div[data-testid="serp-gridcontainer-testid"]',
crawlFields: { crawlFields: {
id: 'a@id', id: 'a@href',
price: 'div[data-testid="cardmfe-price-testid"] | removeNewline | trim', price: 'div[data-testid="cardmfe-price-testid"] | removeNewline | trim',
size: 'div[data-testid="cardmfe-keyfacts-testid"] | removeNewline | trim', size: 'div[data-testid="cardmfe-keyfacts-testid"] | removeNewline | trim',
title: '.css-1cbj9xw', title: '.css-1cbj9xw',
link: 'a@href', link: 'a@href',
address: 'div[data-testid="cardmfe-description-box-address"] | removeNewline | trim', address: 'div[data-testid="cardmfe-description-box-address"] | removeNewline | trim',
}, },
paginate: '#pnlPaging #nlbPlus@href',
normalize: normalize, normalize: normalize,
filter: applyBlacklist, filter: applyBlacklist,
}; };

View File

@@ -6,7 +6,8 @@ let appliedBlacklistedDistricts = [];
function normalize(o) { function normalize(o) {
const size = o.size || '--- m²'; const size = o.size || '--- m²';
const id = buildHash(o.id, o.price); const id = buildHash(o.id, o.price);
return Object.assign(o, {id, size}); const link = `https://www.kleinanzeigen.de${o.link}`;
return Object.assign(o, {id, size, link});
} }
function applyBlacklist(o) { function applyBlacklist(o) {
@@ -14,7 +15,7 @@ function applyBlacklist(o) {
const descNotBlacklisted = !utils.isOneOf(o.description, appliedBlackList); const descNotBlacklisted = !utils.isOneOf(o.description, appliedBlackList);
const isBlacklistedDistrict = const isBlacklistedDistrict =
appliedBlacklistedDistricts.length === 0 ? false : utils.isOneOf(o.description, appliedBlacklistedDistricts); appliedBlacklistedDistricts.length === 0 ? false : utils.isOneOf(o.description, appliedBlacklistedDistricts);
return !isBlacklistedDistrict && titleNotBlacklisted && descNotBlacklisted; return o.title != null && !isBlacklistedDistrict && titleNotBlacklisted && descNotBlacklisted;
} }
const config = { const config = {
@@ -22,16 +23,16 @@ const config = {
crawlContainer: '#srchrslt-adtable .ad-listitem ', crawlContainer: '#srchrslt-adtable .ad-listitem ',
//sort by date is standard oO //sort by date is standard oO
sortByDateParam: null, sortByDateParam: null,
waitForSelector: 'body',
crawlFields: { crawlFields: {
id: '.aditem@data-adid | int', id: '.aditem@data-adid | int',
price: '.aditem-main--middle--price-shipping--price | removeNewline | trim', price: '.aditem-main--middle--price-shipping--price | removeNewline | trim',
size: '.aditem-main .text-module-end span:nth-child(2) | removeNewline | trim', size: '.aditem-main .text-module-end | removeNewline | trim',
title: '.aditem-main .text-module-begin a | removeNewline | trim', title: '.aditem-main .text-module-begin a | removeNewline | trim',
link: '.aditem-main .text-module-begin a@href | removeNewline | trim', link: '.aditem-main .text-module-begin a@href | removeNewline | trim',
description: '.aditem-main p:not(.text-module-end) | removeNewline | trim', description: '.aditem-main .aditem-main--middle--description | removeNewline | trim',
address: '.aditem-main--top--left | trim | removeNewline', address: '.aditem-main--top--left | trim | removeNewline',
}, },
paginate: '#srchrslt-pagination .pagination-next@href',
normalize: normalize, normalize: normalize,
filter: applyBlacklist, filter: applyBlacklist,
}; };

View File

@@ -8,7 +8,7 @@ function nullOrEmpty(val) {
function normalize(o) { function normalize(o) {
const link = nullOrEmpty(o.link) ? 'NO LINK' : `https://www.neubaukompass.de${o.link.substring(o.link.indexOf('/neubau'))}`; const link = nullOrEmpty(o.link) ? 'NO LINK' : `https://www.neubaukompass.de${o.link.substring(o.link.indexOf('/neubau'))}`;
const id = buildHash(o.id, o.price); const id = buildHash(o.link, o.price);
return Object.assign(o, {id, link}); return Object.assign(o, {id, link});
} }
@@ -18,16 +18,16 @@ function applyBlacklist(o) {
const config = { const config = {
url: null, url: null,
crawlContainer: '.nbk-container >div article', crawlContainer: '.col-12.mb-4',
sortByDateParam: 'Sortierung=Id&Richtung=DESC', sortByDateParam: 'Sortierung=Id&Richtung=DESC',
waitForSelector: '.nbk-section',
crawlFields: { crawlFields: {
id: '@id', id: 'a@href',
title: 'a.nbk-truncate@title | removeNewline | trim', title: 'a@title | removeNewline | trim',
link: 'a.nbk-truncate@href', link: 'a@href',
address: 'p.nbk-truncate | removeNewline | trim', address: '.nbk-project-card__description | removeNewline | trim',
price: 'p.nbk-mb-0 | removeNewline | trim', price: '.nbk-project-card__spec-item .nbk-project-card__spec-value | removeNewline | trim',
}, },
paginate: '.numbered-pager__bottom .numbered-pager--info li:nth-child(2) a@href',
normalize: normalize, normalize: normalize,
filter: applyBlacklist, filter: applyBlacklist,
}; };

View File

@@ -17,6 +17,7 @@ const config = {
url: null, url: null,
crawlContainer: '#main_column .wgg_card', crawlContainer: '#main_column .wgg_card',
sortByDateParam: 'sort_column=0&sort_order=0', sortByDateParam: 'sort_column=0&sort_order=0',
waitForSelector: 'body',
crawlFields: { crawlFields: {
id: '@data-id', id: '@data-id',
details: '.row .noprint .col-xs-11 |removeNewline |trim', details: '.row .noprint .col-xs-11 |removeNewline |trim',

View File

@@ -0,0 +1,43 @@
import { setDebug } from './utils.js';
import puppeteerExtractor from './puppeteerExtractor.js';
import { loadParser, parse } from './parser/parser.js';
const DEFAULT_OPTIONS = {
debug: false,
puppeteerTimeout: 60_000,
puppeteerHeadless: true,
};
export default class Extractor {
constructor(options) {
this.options = {
...DEFAULT_OPTIONS,
...options,
};
this.responseText = null;
setDebug(this.options);
}
/**
* if you are extracting data from a SPA, you must provide a selector, otherwise
* your response will never contain what you are really looking for
* @param url
* @param waitForSelector
*/
execute = async (url, waitForSelector = null) => {
this.responseText = null;
try {
this.responseText = await puppeteerExtractor(url, waitForSelector, this.options);
if (this.responseText != null) {
loadParser(this.responseText);
}
} catch (error) {
console.error('Error trying to load page.', error);
}
return this;
};
parseResponseText = (crawlContainer, crawlFields, url) => {
return parse(crawlContainer, crawlFields, this.responseText, url);
};
}

View File

@@ -0,0 +1,97 @@
import * as cheerio from 'cheerio';
let $ = null;
export function loadParser(text) {
$ = cheerio.load(text);
}
export function parse(crawlContainer, crawlFields, text, url) {
if (!text) {
console.warn('Cannot parse, text was empty for url ', url);
return null;
}
if (!crawlContainer || !crawlFields) {
console.warn('Cannot parse, selector was empty for url ', url);
return null;
}
const result = [];
if ($(crawlContainer).length === 0) {
console.warn('No elements in crawl container found for url ', url);
return null;
}
$(crawlContainer).each((_, element) => {
const container = $(element);
const parsedObject = {};
// Parse fields based on crawlFields
for (const [key, fieldSelector] of Object.entries(crawlFields)) {
let value;
try {
const selector = fieldSelector.includes('|')
? fieldSelector.substring(0, fieldSelector.indexOf('|')).trim()
: fieldSelector;
if (selector.includes('@')) {
const [sel, attr] = selector.split('@');
if (sel.length === 0) {
value = container.attr(attr.trim());
} else {
value = container.find(sel.trim()).attr(attr.trim());
}
} else {
value = container.find(selector.trim()).text();
}
// Apply modifiers if specified
if (fieldSelector.includes('|')) {
/* eslint-disable no-unused-vars */
const [_, ...modifiers] = fieldSelector.split('|').map((s) => s.trim());
/* eslint-disable no-unused-vars */
value = applyModifiers(value, modifiers);
}
parsedObject[key] = value || null;
} catch (error) {
console.error(`Error parsing field '${key}' with selector '${fieldSelector}':`, error);
parsedObject[key] = null;
}
}
if (parsedObject.id != null) {
result.push(parsedObject);
} else {
console.warn('ID not found. Not relaying object.');
}
});
return result;
}
// Helper function to apply modifiers
function applyModifiers(value, modifiers) {
if (!value) return value;
modifiers.forEach((modifier) => {
switch (modifier) {
case 'int':
value = parseInt(value, 10);
break;
case 'trim':
value = value.replace(/\s+/g, ' ').trim();
break;
case 'removeNewline':
value = value.replace(/\n/g, ' ');
break;
default:
console.warn(`Unknown modifier: ${modifier}`);
}
});
return value;
}

View File

@@ -0,0 +1,49 @@
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import { debug, DEFAULT_HEADER, botDetected } from './utils.js';
puppeteer.use(StealthPlugin());
export default async function execute(url, waitForSelector, options) {
let browser;
try {
debug(`Sending request to ${url} using Puppeteer.`);
browser = await puppeteer.launch({
headless: options.puppeteerHeadless ?? true,
args: ['--no-sandbox', '--disable-gpu', '--disable-setuid-sandbox'],
timeout: options.puppeteerTimeout || 30_000,
});
let page = await browser.newPage();
await page.setExtraHTTPHeaders(DEFAULT_HEADER);
const response = await page.goto(url, {
waitUntil: 'domcontentloaded',
});
let pageSource;
//if we're extracting data from a spa, we must wait for the selector
if (waitForSelector != null) {
await page.waitForSelector(waitForSelector);
pageSource = await page.evaluate((selector) => {
return document.querySelector(selector).innerHTML;
}, waitForSelector);
} else {
pageSource = await page.content();
}
const statusCode = response.status();
if (botDetected(pageSource, statusCode)) {
console.warn('We have been detected as a bot :-/ Tried url: => ', url);
return null;
}
return await page.content();
} catch (error) {
console.error('Error executing with puppeteer executor', error);
return null;
} finally {
if (browser != null) {
await browser.close();
}
}
}

View File

@@ -0,0 +1,32 @@
let debuggingOn = false;
export const DEFAULT_HEADER = {
Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
Connection: 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
};
export const setDebug = (options) => {
debuggingOn = !!options?.debug;
};
export const debug = (message) => {
if (debuggingOn) {
/* eslint-disable no-console */
console.debug(message);
/* eslint-enable no-console */
}
};
export const botDetected = (pageSource, statusCode) => {
const suspiciousStatusCodes = [403, 429];
const botDetectionPatterns = [/verify you are human/i, /access denied/i, /x-amz-cf-id/i];
const detectedInSource = botDetectionPatterns.some((pattern) => pattern.test(pageSource));
const detectedByStatus = suspiciousStatusCodes.includes(statusCode);
return detectedInSource || detectedByStatus;
};

View File

@@ -1,77 +0,0 @@
import fetch from 'node-fetch';
import { config } from '../utils.js';
import { makeUrlResidential } from './scrapingAnt.js';
import https from 'https';
//if ScrapingAnt got blocked, this http status is returned
const BLOCKED_HTTP_STATUS = 423;
const NOT_FOUND_HTTP_STATUS = 404;
const MAX_RETRIES_SCRAPING_ANT = 10;
const EXPECTED_STATUS_CODES = [BLOCKED_HTTP_STATUS, NOT_FOUND_HTTP_STATUS];
const agent = new https.Agent({
rejectUnauthorized: false,
});
function makeDriver(headers = {}) {
let cookies = '';
async function scrapingAntDriver(context, callback, retryCounter = 0) {
const proxyType = config.scrapingAnt?.proxy || 'datacenter';
try {
const url = proxyType === 'residential' ? makeUrlResidential(context.url) : context.url;
const response = await fetch(url, {
headers: {
...headers,
cookie: cookies,
},
});
const result = await response.text();
if (EXPECTED_STATUS_CODES.includes(response.status)) {
throw new Error(`${response.status}`);
}
if (cookies.length === 0) {
cookies = response.headers.raw()['set-cookie'] || [];
}
callback(null, result);
} catch (exception) {
/* eslint-disable no-console */
if (!EXPECTED_STATUS_CODES.includes(exception.response?.status) && !EXPECTED_STATUS_CODES.includes(Number(exception.message))) {
console.error(`Error while trying to scrape data from scraping ant. Received error: ${exception.message}`);
callback(null, []);
return;
}
if (retryCounter <= MAX_RETRIES_SCRAPING_ANT) {
retryCounter++;
console.debug(`ScrapingAnt got blocked. Retrying ${retryCounter} / ${MAX_RETRIES_SCRAPING_ANT}`);
await scrapingAntDriver(context, callback, retryCounter);
} else {
console.error(`Error while trying to scrape data from scraping ant. Received error: ${exception.message}`);
callback(null, []);
}
/* eslint-enable no-console */
}
}
/**
* The regular request driver is taking care of everyting, that doesn't need to be scraped by ScrapingAnt (which is
* everything != Immoscout & Immonet as of writing this)
*/
return async function driver(context, callback) {
if (context.url.toLowerCase().indexOf('scrapingant') !== -1) {
return scrapingAntDriver(context, callback);
}
try {
const response = await fetch(context.url, {
headers: {
...headers,
Cookie: cookies,
},
agent,
});
const result = await response.text();
callback(null, result);
} catch (exception) {
console.error(`Error while trying to scrape data. Received error: ${exception.message}`);
callback(null, []);
}
};
}
export default makeDriver;

View File

@@ -1,36 +0,0 @@
import { config } from '../utils.js';
import makeDriver from './requestDriver.js';
import Xray from 'x-ray';
class Scraper {
constructor() {
const filters = {
removeNewline: this._removeNewline,
trim: this._trim,
int: this._int,
};
const headers = {
'User-Agent':
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36',
};
if (config.scrapingAnt != null && config.scrapingAnt.apiKey != null) {
headers['x-api-key'] = config.scrapingAnt.apiKey;
}
const driver = makeDriver(headers);
const xray = Xray({ filters });
xray.driver(driver);
this.xray = xray;
}
get x() {
return this.xray;
}
_removeNewline(value) {
return typeof value === 'string' ? value.replace(/\\n/g, '') : value;
}
_trim(value) {
return typeof value === 'string' ? value.replace(/\s+/g, ' ').trim() : value;
}
_int(value) {
return typeof value === 'string' ? parseInt(value, 10) : value;
}
}
export default new Scraper().x;

View File

@@ -1,30 +0,0 @@
import { metaInformation as immoScoutInfo } from '../provider/immoscout.js';
import { metaInformation as immoNetInfo } from '../provider/immonet.js';
import { metaInformation as neuBauCompassInfo } from '../provider/neubauKompass.js';
import { config } from '../utils.js';
const additionalImmonetUrlParams = `&wait_for_selector=.content-wrapper-tiles&js_snippet=${Buffer.from(
'window.scrollTo(0,document.body.scrollHeight);'
).toString('base64')}`;
const needScrapingAnt = (id) => {
return id.toLowerCase() === immoScoutInfo.id || id.toLowerCase() === immoNetInfo.id || id.toLowerCase() === neuBauCompassInfo.id.toLowerCase();
};
export const transformUrlForScrapingAnt = (url, id) => {
let urlParams = '';
if (needScrapingAnt(id)) {
if (id.toLowerCase() === immoNetInfo.id) {
urlParams = additionalImmonetUrlParams;
}
//only do calls to scrapingAnt when dealing with Immoscout/Immonet
url = `https://api.scrapingant.com/v2/general?url=${encodeURIComponent(url)}&proxy_type=datacenter${urlParams}`;
}
return url;
};
export const isScrapingAntApiKeySet = () => {
return config.scrapingAnt != null && config.scrapingAnt.apiKey != null && config.scrapingAnt.apiKey.length > 8;
};
export const makeUrlResidential = (url) => {
return url.replace('datacenter', 'residential');
};
export { needScrapingAnt };

View File

@@ -1,33 +1,40 @@
import Mixpanel from 'mixpanel'; import Mixpanel from 'mixpanel';
import {getJobs} from '../storage/jobStorage.js'; import {getJobs} from '../storage/jobStorage.js';
import {getUniqueId} from './uniqueId.js';
import {config, inDevMode} from '../../utils.js'; import {config, inDevMode} from '../../utils.js';
import os from 'os';
import {readFileSync} from 'fs';
import {packageUp} from 'package-up';
const mixpanelTracker = Mixpanel.init('718670ef1c58c0208256c1e408a3d75e'); const mixpanelTracker = Mixpanel.init('718670ef1c58c0208256c1e408a3d75e');
const distinct_id = getUniqueId() || 'N/A';
const version = await getPackageVersion();
export const track = function () { export const track = function () {
//only send tracking information if the user allowed to do so. //only send tracking information if the user allowed to do so.
if (config.analyticsEnabled && !inDevMode()) { if (config.analyticsEnabled && !inDevMode()) {
const activeProvider = new Set(); const activeProvider = new Set();
const activeAdapter = new Set(); const activeAdapter = new Set();
const jobs = getJobs(); const jobs = getJobs();
if (jobs != null && jobs.length > 0) { if (jobs != null && jobs.length > 0) {
jobs.forEach(job => { jobs.forEach((job) => {
job.provider.forEach(provider => { job.provider.forEach((provider) => {
activeProvider.add(provider.id); activeProvider.add(provider.id);
}); });
job.notificationAdapter.forEach(adapter => { job.notificationAdapter.forEach((adapter) => {
activeAdapter.add(adapter.id); activeAdapter.add(adapter.id);
}); });
}); });
mixpanelTracker.track('fredy_tracking', enrichTrackingObject({ mixpanelTracker.track(
adapter: Array.from(activeAdapter), 'fredy_tracking',
provider: Array.from(activeProvider), enrichTrackingObject({
})); adapter: Array.from(activeAdapter),
provider: Array.from(activeProvider),
}),
);
} }
} }
}; };
@@ -50,9 +57,9 @@ export function trackDemoAccessed() {
} }
} }
function enrichTrackingObject(trackingObject) { function enrichTrackingObject(trackingObject) {
const platform = process.platform; const operating_system = os.platform();
const os_version = os.release();
const arch = process.arch; const arch = process.arch;
const language = process.env.LANG || 'en'; const language = process.env.LANG || 'en';
const nodeVersion = process.version || 'N/A'; const nodeVersion = process.version || 'N/A';
@@ -60,9 +67,24 @@ function enrichTrackingObject(trackingObject) {
return { return {
...trackingObject, ...trackingObject,
isDemo: config.demoMode, isDemo: config.demoMode,
platform, operating_system,
os_version,
arch, arch,
nodeVersion, nodeVersion,
language language,
distinct_id,
fredy_version: version
}; };
} }
async function getPackageVersion() {
try {
const packagePath = await packageUp();
const packageJson = readFileSync(packagePath, 'utf8');
const json = JSON.parse(packageJson);
return json.version;
} catch (error) {
console.error('Error reading version from package.json', error);
}
return 'N/A';
}

View File

@@ -0,0 +1,19 @@
import { hostname, arch, cpus, platform } from 'os';
import { createHash } from 'crypto';
/**
* Don't worry, we are not evil ;) We however need a unique id per running instance
* @returns {string}
*/
export const getUniqueId = () => {
const systemInfo = {
hostname: hostname(),
architecture: arch(),
cpuCount: cpus().length,
platform: platform(),
};
const baseData = JSON.stringify(systemInfo);
return createHash('sha256').update(baseData).digest('hex');
};

View File

@@ -1,6 +1,6 @@
{ {
"name": "fredy", "name": "fredy",
"version": "10.4.1", "version": "11.0.1",
"description": "[F]ind [R]eal [E]states [d]amn eas[y].", "description": "[F]ind [R]eal [E]states [d]amn eas[y].",
"scripts": { "scripts": {
"start": "node prod.js", "start": "node prod.js",
@@ -50,28 +50,33 @@
"Firefox ESR" "Firefox ESR"
], ],
"dependencies": { "dependencies": {
"@douyinfe/semi-ui": "2.69.2", "@douyinfe/semi-ui": "2.72.3",
"@rematch/core": "2.2.0", "@rematch/core": "2.2.0",
"@rematch/loading": "2.1.2", "@rematch/loading": "2.1.2",
"@sendgrid/mail": "8.1.4", "@sendgrid/mail": "8.1.4",
"@vitejs/plugin-react": "4.3.3", "@vitejs/plugin-react": "4.3.4",
"better-sqlite3": "^11.5.0", "better-sqlite3": "^11.7.2",
"body-parser": "1.20.3", "body-parser": "1.20.3",
"cheerio": "^1.0.0",
"cookie-session": "2.1.0", "cookie-session": "2.1.0",
"handlebars": "4.7.8", "handlebars": "4.7.8",
"highcharts": "11.4.8", "highcharts": "12.1.2",
"highcharts-react-official": "3.2.1", "highcharts-react-official": "3.2.1",
"lodash": "4.17.21", "lodash": "4.17.21",
"lowdb": "6.0.1", "lowdb": "6.0.1",
"markdown": "^0.5.0", "markdown": "^0.5.0",
"mixpanel": "^0.18.0", "mixpanel": "^0.18.0",
"nanoid": "5.0.8", "nanoid": "5.0.9",
"node-fetch": "3.3.2", "node-fetch": "3.3.2",
"node-mailjet": "6.0.6", "node-mailjet": "6.0.6",
"package-up": "^5.0.0",
"puppeteer": "^23.11.1",
"puppeteer-extra": "^3.3.6",
"puppeteer-extra-plugin-stealth": "^2.11.2",
"query-string": "9.1.1", "query-string": "9.1.1",
"react": "18.3.1", "react": "18.3.1",
"react-dom": "18.3.1", "react-dom": "18.3.1",
"react-redux": "9.1.2", "react-redux": "9.2.0",
"react-router": "5.2.1", "react-router": "5.2.1",
"react-router-dom": "5.3.0", "react-router-dom": "5.3.0",
"redux": "5.0.1", "redux": "5.0.1",
@@ -80,25 +85,24 @@
"serve-static": "1.16.2", "serve-static": "1.16.2",
"slack": "11.0.2", "slack": "11.0.2",
"string-similarity": "^4.0.4", "string-similarity": "^4.0.4",
"vite": "5.4.11", "vite": "5.4.11"
"x-ray": "2.3.4"
}, },
"devDependencies": { "devDependencies": {
"@babel/core": "7.26.0", "@babel/core": "7.26.0",
"@babel/eslint-parser": "7.25.9", "@babel/eslint-parser": "7.25.9",
"@babel/preset-env": "7.26.0", "@babel/preset-env": "7.26.0",
"@babel/preset-react": "7.25.9", "@babel/preset-react": "7.26.3",
"chai": "5.1.2", "chai": "5.1.2",
"eslint": "8.56.0", "eslint": "8.56.0",
"eslint-config-prettier": "8.8.0", "eslint-config-prettier": "8.8.0",
"eslint-plugin-react": "7.37.2", "eslint-plugin-react": "7.37.3",
"esmock": "2.6.9", "esmock": "2.6.9",
"history": "5.3.0", "history": "5.3.0",
"husky": "9.1.7", "husky": "9.1.7",
"less": "4.2.0", "less": "4.2.1",
"lint-staged": "15.2.10", "lint-staged": "15.3.0",
"mocha": "10.8.2", "mocha": "10.8.2",
"prettier": "3.3.3", "prettier": "3.4.2",
"redux-logger": "3.0.6" "redux-logger": "3.0.6"
} }
} }

View File

@@ -3,7 +3,6 @@ import { get } from '../mocks/mockNotification.js';
import { mockFredy, providerConfig } from '../utils.js'; import { mockFredy, providerConfig } from '../utils.js';
import { expect } from 'chai'; import { expect } from 'chai';
import * as provider from '../../lib/provider/immonet.js'; import * as provider from '../../lib/provider/immonet.js';
import * as scrapingAnt from '../../lib/services/scrapingAnt.js';
describe('#immonet testsuite()', () => { describe('#immonet testsuite()', () => {
after(() => { after(() => {
@@ -13,13 +12,6 @@ describe('#immonet testsuite()', () => {
it('should test immonet provider', async () => { it('should test immonet provider', async () => {
const Fredy = await mockFredy(); const Fredy = await mockFredy();
return await new Promise((resolve) => { return await new Promise((resolve) => {
if (!scrapingAnt.isScrapingAntApiKeySet()) {
/* eslint-disable no-console */
console.info('Skipping Immonet test as ScrapingAnt Api Key is not set.');
/* eslint-enable no-console */
resolve();
return;
}
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'immonet', similarityCache); const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'immonet', similarityCache);
fredy.execute().then((listing) => { fredy.execute().then((listing) => {
expect(listing).to.be.a('array'); expect(listing).to.be.a('array');

View File

@@ -1,48 +1,43 @@
import * as similarityCache from '../../lib/services/similarity-check/similarityCache.js'; import * as similarityCache from '../../lib/services/similarity-check/similarityCache.js';
import { get } from '../mocks/mockNotification.js'; //import {get} from '../mocks/mockNotification.js';
import { mockFredy, providerConfig } from '../utils.js'; import {/*mockFredy, */providerConfig} from '../utils.js';
import { expect } from 'chai'; //import {expect} from 'chai';
import * as provider from '../../lib/provider/immoscout.js'; import * as provider from '../../lib/provider/immoscout.js';
import * as scrapingAnt from '../../lib/services/scrapingAnt.js';
describe('#immoscout testsuite()', () => { describe('#immoscout testsuite()', () => {
after(() => { after(() => {
similarityCache.stopCacheCleanup(); similarityCache.stopCacheCleanup();
}); });
provider.init(providerConfig.immoscout, [], []); provider.init(providerConfig.immoscout, [], []);
it('should test immoscout provider', async () => { it('should test immoscout provider', async () => {
const Fredy = await mockFredy(); //const Fredy = await mockFredy();
return await new Promise((resolve) => { return await new Promise((resolve) => {
if (!scrapingAnt.isScrapingAntApiKeySet()) { /* eslint-disable no-console */
/* eslint-disable no-console */ console.info('Skipping Immoscout test for now until we figured out how to surpass bot detection.');
console.info('Skipping Immoscout test as ScrapingAnt Api Key is not set.'); /* eslint-enable no-console */
/* eslint-enable no-console */ resolve();
resolve(); /*
return; const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'immoscout', similarityCache);
} fredy.execute().then((listing) => {
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'immoscout', similarityCache); expect(listing).to.be.a('array');
fredy.execute().then((listing) => { const notificationObj = get();
expect(listing).to.be.a('array'); expect(notificationObj).to.be.a('object');
const notificationObj = get(); expect(notificationObj.serviceName).to.equal('immoscout');
expect(notificationObj).to.be.a('object'); notificationObj.payload.forEach((notify) => {
expect(notificationObj.serviceName).to.equal('immoscout'); expect(notify.id).to.be.a('number');
notificationObj.payload.forEach((notify) => { expect(notify.price).to.be.a('string');
/** check the actual structure **/ expect(notify.size).to.be.a('string');
expect(notify.id).to.be.a('number'); expect(notify.title).to.be.a('string');
expect(notify.price).to.be.a('string'); expect(notify.link).to.be.a('string');
expect(notify.size).to.be.a('string'); expect(notify.address).to.be.a('string');
expect(notify.title).to.be.a('string'); expect(notify.price).that.does.include('€');
expect(notify.link).to.be.a('string'); expect(notify.size).that.does.include('m²');
expect(notify.address).to.be.a('string'); expect(notify.title).to.be.not.empty;
/** check the values if possible **/ expect(notify.link).that.does.include('https://www.immobilienscout24.de');
expect(notify.price).that.does.include('€'); expect(notify.address).to.be.not.empty;
expect(notify.size).that.does.include('m²'); });
expect(notify.title).to.be.not.empty; resolve();
expect(notify.link).that.does.include('https://www.immobilienscout24.de'); });*/
expect(notify.address).to.be.not.empty; });
});
resolve();
});
}); });
});
}); });

View File

@@ -3,7 +3,6 @@ import {get} from '../mocks/mockNotification.js';
import {mockFredy, providerConfig} from '../utils.js'; import {mockFredy, providerConfig} from '../utils.js';
import {expect} from 'chai'; import {expect} from 'chai';
import * as provider from '../../lib/provider/neubauKompass.js'; import * as provider from '../../lib/provider/neubauKompass.js';
import * as scrapingAnt from '../../lib/services/scrapingAnt.js';
describe('#neubauKompass testsuite()', () => { describe('#neubauKompass testsuite()', () => {
after(() => { after(() => {
@@ -13,13 +12,6 @@ describe('#neubauKompass testsuite()', () => {
it('should test neubauKompass provider', async () => { it('should test neubauKompass provider', async () => {
const Fredy = await mockFredy(); const Fredy = await mockFredy();
return await new Promise((resolve) => { return await new Promise((resolve) => {
if (!scrapingAnt.isScrapingAntApiKeySet()) {
/* eslint-disable no-console */
console.info('Skipping Neubaukompass test as ScrapingAnt Api Key is not set.');
/* eslint-enable no-console */
resolve();
return;
}
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'neubauKompass', similarityCache); const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'neubauKompass', similarityCache);
fredy.execute().then((listing) => { fredy.execute().then((listing) => {
expect(listing).to.be.a('array'); expect(listing).to.be.a('array');

View File

@@ -9,7 +9,7 @@
"enabled": true "enabled": true
}, },
"immonet": { "immonet": {
"url": "https://www.immonet.de/immobiliensuche/beta?pageoffset=1&listsize=100&objecttype=1&locationname=D%C3%BCsseldorf&acid=&actype=&district=8717&district=8718&district=8719&district=8720&district=8721&district=8723&district=8724&district=8725&district=8727&district=8728&district=8729&district=8730&district=8731&district=8732&district=8733&district=8737&district=8738&district=8741&district=8745&district=8747&district=8750&district=8752&district=8754&district=8755&district=8756&district=8759&district=8760&district=8761&district=8763&district=8764&district=8765&ajaxIsRadiusActive=false&sortby=19&suchart=1&radius=0&pcatmtypes=1_1&pCatMTypeStoragefield=&parentcat=1&marketingtype=1&fromprice=&toprice=420000&fromarea=90&toarea=&fromplotarea=&toplotarea=&fromrooms=3&torooms=&objectcat=225&objectcat=18&objectcat=17&objectcat=12&objectcat=16&objectcat=181&objectcat=14&objectcat=15&objectcat=226&objectcat=13&wbs=-1&fromyear=&toyear=", "url": "https://www.immonet.de/classified-search?distributionTypes=Buy,Buy_Auction,Compulsory_Auction&estateTypes=House,Apartment&locations=AD08DE2112&order=Default&m=homepage_new_search_classified_search_result",
"enabled": true "enabled": true
}, },
"immowelt": { "immowelt": {

View File

@@ -2,13 +2,13 @@ import React from 'react';
import {useDispatch, useSelector} from 'react-redux'; import {useDispatch, useSelector} from 'react-redux';
import {Divider, Input, Radio, TimePicker, Button, RadioGroup, Checkbox} from '@douyinfe/semi-ui'; import {Divider, TimePicker, Button, Checkbox} from '@douyinfe/semi-ui';
import {InputNumber} from '@douyinfe/semi-ui'; import {InputNumber} from '@douyinfe/semi-ui';
import Headline from '../../components/headline/Headline'; import Headline from '../../components/headline/Headline';
import {xhrPost} from '../../services/xhr'; import {xhrPost} from '../../services/xhr';
import {SegmentPart} from '../../components/segment/SegmentPart'; import {SegmentPart} from '../../components/segment/SegmentPart';
import {Banner, Toast} from '@douyinfe/semi-ui'; import {Banner, Toast} from '@douyinfe/semi-ui';
import {IconSave, IconCalendar, IconKey, IconRefresh, IconSignal, IconLineChartStroked, IconSearch} from '@douyinfe/semi-icons'; import {IconSave, IconCalendar, IconRefresh, IconSignal, IconLineChartStroked, IconSearch} from '@douyinfe/semi-icons';
import './GeneralSettings.less'; import './GeneralSettings.less';
function formatFromTimestamp(ts) { function formatFromTimestamp(ts) {
@@ -35,8 +35,6 @@ const GeneralSettings = function GeneralSettings() {
const [interval, setInterval] = React.useState(''); const [interval, setInterval] = React.useState('');
const [port, setPort] = React.useState(''); const [port, setPort] = React.useState('');
const [scrapingAntApiKey, setScrapingAntApiKey] = React.useState('');
const [scrapingAntProxy, setScrapingAntProxy] = React.useState('');
const [workingHourFrom, setWorkingHourFrom] = React.useState(null); const [workingHourFrom, setWorkingHourFrom] = React.useState(null);
const [workingHourTo, setWorkingHourTo] = React.useState(null); const [workingHourTo, setWorkingHourTo] = React.useState(null);
const [demoMode, setDemoMode] = React.useState(null); const [demoMode, setDemoMode] = React.useState(null);
@@ -55,10 +53,8 @@ const GeneralSettings = function GeneralSettings() {
async function init() { async function init() {
setInterval(settings?.interval); setInterval(settings?.interval);
setPort(settings?.port); setPort(settings?.port);
setScrapingAntApiKey(settings?.scrapingAnt?.apiKey);
setWorkingHourFrom(settings?.workingHours?.from); setWorkingHourFrom(settings?.workingHours?.from);
setWorkingHourTo(settings?.workingHours?.to); setWorkingHourTo(settings?.workingHours?.to);
setScrapingAntProxy(settings?.scrapingAnt?.proxy || 'datacenter');
setAnalyticsEnabled(settings?.analyticsEnabled || false); setAnalyticsEnabled(settings?.analyticsEnabled || false);
setDemoMode(settings?.demoMode || false); setDemoMode(settings?.demoMode || false);
} }
@@ -96,10 +92,6 @@ const GeneralSettings = function GeneralSettings() {
await xhrPost('/api/admin/generalSettings', { await xhrPost('/api/admin/generalSettings', {
interval, interval,
port, port,
scrapingAnt: {
apiKey: scrapingAntApiKey,
proxy: scrapingAntProxy,
},
workingHours: { workingHours: {
from: workingHourFrom, from: workingHourFrom,
to: workingHourTo, to: workingHourTo,
@@ -155,68 +147,6 @@ const GeneralSettings = function GeneralSettings() {
/> />
</SegmentPart> </SegmentPart>
<Divider margin="1rem"/> <Divider margin="1rem"/>
<SegmentPart
name="ScrapingAnt Api Key"
helpText="The api key for ScrapingAnt is used to be able to scrape Immoscout."
Icon={IconKey}
>
<Input
type="text"
placeholder="ScrapingAnt Api Key"
value={scrapingAntApiKey}
onChange={(val) => setScrapingAntApiKey(val)}
/>
</SegmentPart>
<Divider margin="1rem"/>
<SegmentPart
name="ScrapingAnt proxy settings"
helpText="Scraping ant provides different proxies."
Icon={IconKey}
>
<Banner
fullMode={false}
type="info"
closeIcon={null}
title={
<div style={{fontWeight: 600, fontSize: '14px', lineHeight: '20px'}}>
ScrapingAnt is needed to scrape Immoscout. ScrapingAnt itself is using 2
different types of proxies
</div>
}
style={{marginBottom: '1rem'}}
description={
<div>
<h4>Datacenter-Proxy</h4>
Proxy server located in one of the datacenters across the world. Datacenter
proxies are slower and
more likely to fail, but they are cheaper. A call with a datacenter proxy cost
10 credits.
<h4>Residential-Proxy</h4>
High-quality proxy server located in one of the real people houses across the
world. Datacenter
proxies are faster and more likely to success, but they are more expensive.
<br/>
<br/>
<b>
On the free tier, you have 10.000 credits, so chose your option wisely. Keep
in mind, only
successful calls will be charged.
</b>
</div>
}
/>
<RadioGroup value={scrapingAntProxy} onChange={(e) => setScrapingAntProxy(e.target.value)}>
<Radio name="datacenter" value="datacenter" checked={scrapingAntProxy === 'datacenter'}>
Datacenter proxy
</Radio>
<Radio name="residential" value="residential"
checked={scrapingAntProxy === 'residential'}>
Residential proxy
</Radio>
</RadioGroup>
</SegmentPart>
<Divider margin="1rem"/>
<SegmentPart <SegmentPart
name="Working hours" name="Working hours"
helpText="During this hours, Fredy will search for new apartments. If nothing is configured, Fredy will search around the clock." helpText="During this hours, Fredy will search for new apartments. If nothing is configured, Fredy will search around the clock."

View File

@@ -1,31 +1,11 @@
import React from 'react'; import React from 'react';
import {format} from '../../services/time/timeService'; import {format} from '../../services/time/timeService';
import {Banner, Card, Descriptions, Divider} from '@douyinfe/semi-ui'; import {Banner, Descriptions} from '@douyinfe/semi-ui';
import {IconBolt} from '@douyinfe/semi-icons';
export default function ProcessingTimes({processingTimes = {}}) { export default function ProcessingTimes({processingTimes = {}}) {
const {Meta} = Card;
if (Object.keys(processingTimes).length === 0) { if (Object.keys(processingTimes).length === 0) {
return null; return null;
} }
if (processingTimes.error != null) {
return <Banner
fullMode={false}
type="danger"
closeIcon={null}
title={
<div style={{fontWeight: 600, fontSize: '14px', lineHeight: '20px'}}>
Scraping Ant Error
</div>
}
style={{marginBottom: '1rem'}}
description={
<div>
{processingTimes.error}
</div>
}
/>;
}
return ( return (
<> <>
<Descriptions <Descriptions
@@ -47,44 +27,6 @@ export default function ProcessingTimes({processingTimes = {}}) {
</> </>
)} )}
</Descriptions> </Descriptions>
{(processingTimes.scrapingAntData != null && Object.keys(processingTimes.scrapingAntData).length > 0) &&(
<>
<Divider margin="1rem"/>
<Card
style={{backgroundColor: '#35363c'}}
title={
<Meta
title="Remaining ScrapingAnt calls"
description="Information about your Scraping Ant Plan"
avatar={<IconBolt/>}
/>
}
>
<p>Plan: {processingTimes.scrapingAntData.plan_name}</p>
<p>
Duration: {format(new Date(processingTimes.scrapingAntData.start_date))} -{' '}
{format(new Date(processingTimes.scrapingAntData.end_date))}
<br/>
Credits: {processingTimes.scrapingAntData.remained_credits}/
{processingTimes.scrapingAntData.plan_total_credits}
</p>
If you want to scrape Immoscout or Immonet more often, you have to purchase a premium account
of{' '}
<a href="https://scrapingant.com/" target="_blank" rel="noreferrer">
ScrapingAnt
</a>
. You can use the code <b>FREDY10</b> to get 10% off. (No affiliation, we are <b>not</b> getting
paid by ScrapingAnt.)
</Card>
</>
)}
</> </>
); );
} }
/*
*/

View File

@@ -96,17 +96,15 @@ export default function ProviderMutator({ onVisibilityChanged, visible = false,
fullMode={false} fullMode={false}
type="warning" type="warning"
closeIcon={null} closeIcon={null}
title={<div style={{ fontWeight: 600, fontSize: '14px', lineHeight: '20px' }}>ScrapingAnt</div>} title={<div style={{ fontWeight: 600, fontSize: '14px', lineHeight: '20px' }}>Warning</div>}
style={{ marginBottom: '1rem' }} style={{ marginBottom: '1rem' }}
description={ description={
<div> <div>
<p> <p>
If you chose Immoscout, Immonet or NeubauKompass as a provider, make sure to also add the scrapingAnt apiKey to the config.json. Immoscout will not work at the moment due to advanced bot detection. I'm currently working on a fix.
(See readme)
</p> </p>
<p> <p>
Do not forget to sort the results by date before copying the url to Fredy, so that Fredy always captures Until a fix has been released, Immoscout won't yield any results.
the latest search results.
</p> </p>
</div> </div>
} }

2398
yarn.lock

File diff suppressed because it is too large Load Diff