mirror of
https://github.com/orangecoding/fredy.git
synced 2026-06-16 12:31:07 +00:00
Compare commits
21 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a138dafc31 | ||
|
|
c6bb3c44d4 | ||
|
|
a3471a091a | ||
|
|
b5a96afcc8 | ||
|
|
3903ab59cf | ||
|
|
8fe7cec2a1 | ||
|
|
97deea6f5b | ||
|
|
1ecbbdd774 | ||
|
|
e1db3840f6 | ||
|
|
26127eeac1 | ||
|
|
90a4ee5dcf | ||
|
|
2aaf63c253 | ||
|
|
f52e3e9fd8 | ||
|
|
0d69232395 | ||
|
|
b473cf7fb4 | ||
|
|
3b8279c714 | ||
|
|
214e714c03 | ||
|
|
58965a6f1b | ||
|
|
3c0e9e56c6 | ||
|
|
f5d56a6bda | ||
|
|
324b14da50 |
2
.github/workflows/test.yml
vendored
2
.github/workflows/test.yml
vendored
@@ -6,6 +6,8 @@ on:
|
||||
pull_request:
|
||||
branches:
|
||||
- master
|
||||
schedule:
|
||||
- cron: '0 12 * * *'
|
||||
jobs:
|
||||
test:
|
||||
name: Test
|
||||
|
||||
@@ -4,6 +4,11 @@ WORKDIR /fredy
|
||||
|
||||
COPY . /fredy
|
||||
|
||||
RUN apt-get update && apt-get install -y chromium
|
||||
|
||||
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \
|
||||
PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium
|
||||
|
||||
RUN yarn install
|
||||
|
||||
RUN yarn global add pm2
|
||||
|
||||
2
LICENSE
2
LICENSE
@@ -1,6 +1,6 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2024 Christian Kellner
|
||||
Copyright (c) 2025 Christian Kellner
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
|
||||
11
README.md
11
README.md
@@ -11,7 +11,7 @@ If _Fredy_ finds matching results, it will send them to you via Slack, Email, Te
|
||||
# Sponsorship [](https://github.com/sponsors/orangecoding)
|
||||
If you like my work, consider becoming a sponsor. I'm not expecting anybody to pay for _Fredy_ or any other Open Source Project I'm maintaining, however keep in mind, I'm doing all of this in my spare time :) Thanks.
|
||||
|
||||
<img src="https://github.com/orangecoding/fredy/blob/master/doc/jetbrains.png" width="200">
|
||||
[](https://jb.gg/OpenSourceSupport)
|
||||
|
||||
_Fredy_ is supported by JetBrains under Open Source Support Program
|
||||
|
||||
@@ -81,13 +81,8 @@ yarn run test
|
||||
# Architecture
|
||||

|
||||
|
||||
### Immoscout / Immonet / NeubauKompass
|
||||
I have added **experimental** support for Immoscout, Immonet and NeubauKompass. They all are somewhat special, because they have decided to secure their service from bots using Re-Capture. Finding a way around this is barely possible. For _Fredy_ to be able to bypass this check, I'm using a service called [ScrapingAnt](https://scrapingant.com/). The trick is to use a headless browser, rotating proxies and (once successfully validated) to re-send the cookies each time.
|
||||
|
||||
To be able to use Immoscout / Immonet, you need to create an account at ScrapingAnt. Configure the API key in the "General Settings" tab (visible when logged in as administrator).
|
||||
The rest will be handled by _Fredy_. Keep in mind, the support is experimental. There might be bugs and you might not always pass the re-capture check, but most of the time it works rather well :)
|
||||
|
||||
If you need more than the 1000 API calls allowed per month, I'd suggest opting for a paid account... ScrapingAnt loves OpenSource, therefore they have decided to give all _Fredy_ users a 10% discount by using the code **FREDY10** (Disclaimer: I do not earn any money for recommending their service).
|
||||
### Immoscout
|
||||
Immoscout has implemented advanced bot detection. I’m actively working on bypassing these measures, but until then, selecting Immoscout as a provider will not return any results. I apologize for the inconvenience. 😉
|
||||
|
||||
# Analytics
|
||||
Fredy is completely free (and will always remain free). However, it would be a huge help if you’d allow me to collect some analytical data.
|
||||
|
||||
@@ -1 +1 @@
|
||||
{"interval":"60","port":9998,"scrapingAnt":{"apiKey":"","proxy":"datacenter"},"workingHours":{"from":"","to":""},"demoMode":false,"analyticsEnabled":null}
|
||||
{"interval":"60","port":9998,"workingHours":{"from":"","to":""},"demoMode":false,"analyticsEnabled":null}
|
||||
@@ -1,9 +1,9 @@
|
||||
import { NoNewListingsWarning } from './errors.js';
|
||||
import { setKnownListings, getKnownListings } from './services/storage/listingsStorage.js';
|
||||
import * as notify from './notification/notify.js';
|
||||
import xray from './services/scraper.js';
|
||||
import * as scrapingAnt from './services/scrapingAnt.js';
|
||||
import Extractor from './services/extractor/extractor.js';
|
||||
import urlModifier from './services/queryStringMutator.js';
|
||||
|
||||
class FredyRuntime {
|
||||
/**
|
||||
*
|
||||
@@ -20,6 +20,7 @@ class FredyRuntime {
|
||||
this._jobKey = jobKey;
|
||||
this._similarityCache = similarityCache;
|
||||
}
|
||||
|
||||
execute() {
|
||||
return (
|
||||
//modify the url to make sure search order is correctly set
|
||||
@@ -42,56 +43,40 @@ class FredyRuntime {
|
||||
.catch(this._handleError.bind(this))
|
||||
);
|
||||
}
|
||||
|
||||
_getListings(url) {
|
||||
const extractor = new Extractor();
|
||||
return new Promise((resolve, reject) => {
|
||||
const id = this._providerId;
|
||||
if (scrapingAnt.needScrapingAnt(id) && !scrapingAnt.isScrapingAntApiKeySet()) {
|
||||
const error = 'Immoscout or Immonet can only be used with if you have set an apikey for scrapingAnt.';
|
||||
/* eslint-disable no-console */
|
||||
console.log(error);
|
||||
/* eslint-enable no-console */
|
||||
reject(error);
|
||||
return;
|
||||
}
|
||||
const u = scrapingAnt.needScrapingAnt(id) ? scrapingAnt.transformUrlForScrapingAnt(url, id) : url;
|
||||
try {
|
||||
if (this._providerConfig.paginate != null) {
|
||||
xray(u, this._providerConfig.crawlContainer, [this._providerConfig.crawlFields])
|
||||
//the first 2 pages should be enough here
|
||||
.limit(2)
|
||||
.paginate(this._providerConfig.paginate)
|
||||
.then((listings) => {
|
||||
resolve(listings == null ? [] : listings);
|
||||
})
|
||||
.catch((err) => {
|
||||
reject(err);
|
||||
console.error(err);
|
||||
});
|
||||
} else {
|
||||
xray(u, this._providerConfig.crawlContainer, [this._providerConfig.crawlFields])
|
||||
.then((listings) => {
|
||||
resolve(listings == null ? [] : listings);
|
||||
})
|
||||
.catch((err) => {
|
||||
reject(err);
|
||||
console.error(err);
|
||||
});
|
||||
}
|
||||
} catch (error) {
|
||||
reject(error);
|
||||
console.error(error);
|
||||
}
|
||||
extractor
|
||||
.execute(url, this._providerConfig.waitForSelector)
|
||||
.then(() => {
|
||||
const listings = extractor.parseResponseText(
|
||||
this._providerConfig.crawlContainer,
|
||||
this._providerConfig.crawlFields,
|
||||
url,
|
||||
);
|
||||
resolve(listings == null ? [] : listings);
|
||||
})
|
||||
.catch((err) => {
|
||||
reject(err);
|
||||
/* eslint-disable no-console */
|
||||
console.error(err);
|
||||
/* eslint-enable no-console */
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
_normalize(listings) {
|
||||
return listings.map(this._providerConfig.normalize);
|
||||
}
|
||||
|
||||
_filter(listings) {
|
||||
//only return those where all the fields have been found
|
||||
const keys = Object.keys(this._providerConfig.crawlFields);
|
||||
const filteredListings = listings.filter((item) => keys.every((key) => key in item));
|
||||
return filteredListings.filter(this._providerConfig.filter);
|
||||
}
|
||||
|
||||
_findNew(listings) {
|
||||
const newListings = listings.filter((o) => getKnownListings(this._jobKey, this._providerId)[o.id] == null);
|
||||
if (newListings.length === 0) {
|
||||
@@ -99,6 +84,7 @@ class FredyRuntime {
|
||||
}
|
||||
return newListings;
|
||||
}
|
||||
|
||||
_notify(newListings) {
|
||||
if (newListings.length === 0) {
|
||||
throw new NoNewListingsWarning();
|
||||
@@ -106,6 +92,7 @@ class FredyRuntime {
|
||||
const sendNotifications = notify.send(this._providerId, newListings, this._notificationConfig, this._jobKey);
|
||||
return Promise.all(sendNotifications).then(() => newListings);
|
||||
}
|
||||
|
||||
_save(newListings) {
|
||||
const currentListings = getKnownListings(this._jobKey, this._providerId) || {};
|
||||
newListings.forEach((listing) => {
|
||||
@@ -114,6 +101,7 @@ class FredyRuntime {
|
||||
setKnownListings(this._jobKey, this._providerId, currentListings);
|
||||
return newListings;
|
||||
}
|
||||
|
||||
_filterBySimilarListings(listings) {
|
||||
const filteredList = listings.filter((listing) => {
|
||||
const similar = this._similarityCache.hasSimilarEntries(this._jobKey, listing.title);
|
||||
@@ -127,8 +115,10 @@ class FredyRuntime {
|
||||
filteredList.forEach((filter) => this._similarityCache.addCacheEntry(this._jobKey, filter.title));
|
||||
return filteredList;
|
||||
}
|
||||
|
||||
_handleError(err) {
|
||||
if (err.name !== 'NoNewListingsWarning') console.error(err);
|
||||
}
|
||||
}
|
||||
|
||||
export default FredyRuntime;
|
||||
|
||||
@@ -1,12 +1,9 @@
|
||||
import restana from 'restana';
|
||||
import fetch from 'node-fetch';
|
||||
import * as jobStorage from '../../services/storage/jobStorage.js';
|
||||
import * as userStorage from '../../services/storage/userStorage.js';
|
||||
import * as immoscoutProvider from '../../provider/immoscout.js';
|
||||
import { config } from '../../utils.js';
|
||||
import { isAdmin } from '../security.js';
|
||||
import {isScrapingAntApiKeySet} from '../../services/scrapingAnt.js';
|
||||
import {trackDemoJobCreated} from '../../services/tracking/Tracker.js';
|
||||
import { trackDemoJobCreated } from '../../services/tracking/Tracker.js';
|
||||
const service = restana();
|
||||
const jobRouter = service.newRouter();
|
||||
function doesJobBelongsToUser(job, req) {
|
||||
@@ -27,34 +24,14 @@ jobRouter.get('/', async (req, res) => {
|
||||
res.send();
|
||||
});
|
||||
jobRouter.get('/processingTimes', async (req, res) => {
|
||||
let scrapingAntData = {};
|
||||
if (isScrapingAntApiKeySet()) {
|
||||
try {
|
||||
const response = await fetch(`https://api.scrapingant.com/v2/usage?x-api-key=${config.scrapingAnt.apiKey}`);
|
||||
scrapingAntData = await response.json();
|
||||
} catch (Exception) {
|
||||
console.error('Could not query plan data from scraping ant.', Exception);
|
||||
}
|
||||
}
|
||||
res.body = {
|
||||
interval: config.interval,
|
||||
lastRun: config.lastRun || null,
|
||||
scrapingAntData,
|
||||
error: scrapingAntData?.detail == null ? null : scrapingAntData?.detail
|
||||
};
|
||||
res.send();
|
||||
});
|
||||
jobRouter.post('/', async (req, res) => {
|
||||
const { provider, notificationAdapter, name, blacklist = [], jobId, enabled } = req.body;
|
||||
if (
|
||||
provider.find((p) => p.id === immoscoutProvider.metaInformation.id) != null &&
|
||||
(config.scrapingAnt.apiKey == null || config.scrapingAnt.apiKey.length === 0)
|
||||
) {
|
||||
res.send(
|
||||
new Error('To use Immoscout as provider, you need to configure ScrapingAnt first. Please check the readme.')
|
||||
);
|
||||
return;
|
||||
}
|
||||
try {
|
||||
jobStorage.upsertJob({
|
||||
userId: req.session.currentUser,
|
||||
@@ -72,7 +49,7 @@ jobRouter.post('/', async (req, res) => {
|
||||
trackDemoJobCreated({
|
||||
name,
|
||||
provider,
|
||||
adapter: notificationAdapter
|
||||
adapter: notificationAdapter,
|
||||
});
|
||||
res.send();
|
||||
});
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
export const DEFAULT_CONFIG = {
|
||||
'interval': '60',
|
||||
'port': 9998,
|
||||
'scrapingAnt': {'apiKey': '', 'proxy': 'datacenter'},
|
||||
'workingHours': {'from': '', 'to': ''},
|
||||
'demoMode': false,
|
||||
'analyticsEnabled': null
|
||||
|
||||
@@ -7,9 +7,11 @@ export const send = ({ serviceName, newListings, notificationConfig, jobKey }) =
|
||||
const job = getJob(jobKey);
|
||||
const jobName = job == null ? jobKey : job.name;
|
||||
const promises = newListings.map((newListing) => {
|
||||
const message = `Address: ${newListing.address} Size: ${newListing.size.replace(/2m/g, '$m^2$')} Price: ${
|
||||
newListing.price
|
||||
}`;
|
||||
const message = `
|
||||
Address: ${newListing.address}
|
||||
Size: ${newListing.size.replace(/2m/g, '$m^2$')}
|
||||
Price: ${newListing.price}
|
||||
Link: ${newListing.link}`;
|
||||
return fetch(server, {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
|
||||
@@ -1,50 +1,73 @@
|
||||
import { markdown2Html } from '../../services/markdown.js';
|
||||
import { getJob } from '../../services/storage/jobStorage.js';
|
||||
import {markdown2Html} from '../../services/markdown.js';
|
||||
import {getJob} from '../../services/storage/jobStorage.js';
|
||||
import fetch from 'node-fetch';
|
||||
|
||||
export const send = ({ serviceName, newListings, notificationConfig, jobKey }) => {
|
||||
const { token, user, device } = notificationConfig.find((adapter) => adapter.id === config.id).fields;
|
||||
const job = getJob(jobKey);
|
||||
const jobName = job == null ? jobKey : job.name;
|
||||
const promises = newListings.map((newListing) => {
|
||||
const title = `${jobName} at ${serviceName}: ${newListing.title}`;
|
||||
const message = `Address: ${newListing.address}\nSize: ${newListing.size}\nPrice: ${newListing.price}\nLink: ${newListing.link}`;
|
||||
return fetch('https://api.pushover.net/1/messages.json', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
token: token,
|
||||
user: user,
|
||||
message: message,
|
||||
device: device,
|
||||
title: title,
|
||||
}),
|
||||
export const send = ({serviceName, newListings, notificationConfig, jobKey}) => {
|
||||
const {token, user, device} = notificationConfig.find((adapter) => adapter.id === config.id).fields;
|
||||
const job = getJob(jobKey);
|
||||
const jobName = job == null ? jobKey : job.name;
|
||||
const promises = newListings.map((newListing) => {
|
||||
const title = `${jobName} at ${serviceName}: ${newListing.title}`;
|
||||
const message = `Address: ${newListing.address}\nSize: ${newListing.size}\nPrice: ${newListing.price}\nLink: ${newListing.link}`;
|
||||
return fetch('https://api.pushover.net/1/messages.json', {
|
||||
method: 'POST',
|
||||
headers: {'Content-Type': 'application/json'},
|
||||
body: JSON.stringify({
|
||||
token: token,
|
||||
user: user,
|
||||
message: message,
|
||||
device: device,
|
||||
title: title,
|
||||
}),
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
return Promise.all(promises);
|
||||
return Promise.all(promises)
|
||||
.then((responses) => {
|
||||
// Convert all responses to JSON
|
||||
return Promise.all(responses.map((response) => response.json()));
|
||||
})
|
||||
.then((data) => {
|
||||
// Check for errors in the data
|
||||
const error = data
|
||||
.map((item) => (item.errors != null && item.errors.length > 0 ? item.errors.join(', ') : null))
|
||||
.filter((err) => err !== null);
|
||||
|
||||
if (error.length > 0) {
|
||||
// Reject with the combined error messages
|
||||
return Promise.reject(error.join('; '));
|
||||
}
|
||||
|
||||
return data;
|
||||
})
|
||||
.then(() => {
|
||||
return Promise.resolve();
|
||||
})
|
||||
.catch((error) => {
|
||||
return Promise.reject(error);
|
||||
});
|
||||
};
|
||||
|
||||
export const config = {
|
||||
id: 'pushover',
|
||||
name: 'Pushover',
|
||||
readme: markdown2Html('lib/notification/adapter/pushover.md'),
|
||||
description: 'Fredy will send new listings to your mobile using Pushover.',
|
||||
fields: {
|
||||
token: {
|
||||
type: 'text',
|
||||
label: 'API token',
|
||||
description: 'Your application\'s API token.',
|
||||
id: 'pushover',
|
||||
name: 'Pushover',
|
||||
readme: markdown2Html('lib/notification/adapter/pushover.md'),
|
||||
description: 'Fredy will send new listings to your mobile using Pushover.',
|
||||
fields: {
|
||||
token: {
|
||||
type: 'text',
|
||||
label: 'API token',
|
||||
description: 'Your application\'s API token.',
|
||||
},
|
||||
user: {
|
||||
type: 'text',
|
||||
label: 'User key',
|
||||
description: 'Your user/group key.',
|
||||
},
|
||||
device: {
|
||||
type: 'text',
|
||||
label: 'Device name',
|
||||
description: 'The device name to send your notification to. Messages may be addressed to multiple specific devices by joining them with a comma.',
|
||||
},
|
||||
},
|
||||
user: {
|
||||
type: 'text',
|
||||
label: 'User key',
|
||||
description: 'Your user/group key.',
|
||||
},
|
||||
device: {
|
||||
type: 'text',
|
||||
label: 'Device name',
|
||||
description: 'The device name to send your notification to. Messages may be addressed to multiple specific devices by joining them with a comma.',
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
@@ -2,14 +2,10 @@ import utils, { buildHash } from '../utils.js';
|
||||
let appliedBlackList = [];
|
||||
|
||||
function normalize(o) {
|
||||
let size = `${o.size.replace(' Wohnfläche ', '').trim()}`;
|
||||
if (o.rooms != null) {
|
||||
size += ` / / ${o.rooms.trim()}`;
|
||||
}
|
||||
const link = `https://www.1a-immobilienmarkt.de/expose/${o.id}.html`;
|
||||
const price = normalizePrice(o.price);
|
||||
const id = buildHash(o.id, price);
|
||||
return Object.assign(o, { id, price, size, link });
|
||||
return Object.assign(o, { id, price, link });
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -39,12 +35,12 @@ const config = {
|
||||
url: null,
|
||||
crawlContainer: '.tabelle',
|
||||
sortByDateParam: 'sort_type=newest',
|
||||
waitForSelector: 'body',
|
||||
crawlFields: {
|
||||
id: '.inner_object_data input[name="marker_objekt_id"]@value | int',
|
||||
price: '.tabelle .inner_object_data .single_data_price | removeNewline | trim',
|
||||
size: '.tabelle .inner_object_data .data_boxes div:nth-child(1)',
|
||||
rooms: '.tabelle .inner_object_data .data_boxes div:nth-child(2)',
|
||||
title: '.tabelle .inner_object_data .tabelle_inhalt_titel_black | removeNewline | trim',
|
||||
price: '.inner_object_data .single_data_price | removeNewline | trim',
|
||||
size: '.tabelle .tabelle_inhalt_infos .single_data_box | removeNewline | trim',
|
||||
title: '.inner_object_data .tabelle_inhalt_titel_black | removeNewline | trim',
|
||||
},
|
||||
normalize: normalize,
|
||||
filter: applyBlacklist,
|
||||
|
||||
@@ -11,8 +11,9 @@ function normalize(o) {
|
||||
const price = o.price || 'N/A €';
|
||||
const title = o.title || 'No title available';
|
||||
const address = o.address || 'No address available';
|
||||
const link = shortenLink(o.link);
|
||||
const id = buildHash(parseId(shortenLink(o.link)), o.price);
|
||||
const shortLink = shortenLink(o.link);
|
||||
const link = `https://www.immobilien.de/${shortLink}`;
|
||||
const id = buildHash(parseId(shortLink), o.price);
|
||||
return Object.assign(o, { id, price, size, title, address, link });
|
||||
}
|
||||
function applyBlacklist(o) {
|
||||
@@ -22,9 +23,11 @@ function applyBlacklist(o) {
|
||||
}
|
||||
const config = {
|
||||
url: null,
|
||||
crawlContainer: '.estates_list .list_immo a._ref',
|
||||
crawlContainer: '._ref',
|
||||
sortByDateParam: 'sort_col=*created_ts&sort_dir=desc',
|
||||
waitForSelector: 'body',
|
||||
crawlFields: {
|
||||
id: '@href', //will be transformed later
|
||||
price: '.list_entry .immo_preis .label_info',
|
||||
size: '.list_entry .flaeche .label_info | removeNewline | trim',
|
||||
title: '.list_entry .part_text h3 span',
|
||||
@@ -32,7 +35,6 @@ const config = {
|
||||
link: '@href',
|
||||
address: '.list_entry .place',
|
||||
},
|
||||
paginate: '.list_immo .blocknav .blocknav_list li.next a@href',
|
||||
normalize: normalize,
|
||||
filter: applyBlacklist,
|
||||
};
|
||||
|
||||
@@ -1,12 +1,20 @@
|
||||
import utils, {buildHash} from '../utils.js';
|
||||
import utils, { buildHash } from '../utils.js';
|
||||
let appliedBlackList = [];
|
||||
|
||||
/**
|
||||
* Note, Immonet is rly a piece of sh*t. It is using a weird combination of React and some buttons (instead of links),
|
||||
* so that if somebody clicks the listing, a new page will open with the actual link to the listing. Of course, a scraper
|
||||
* cannot do this (which is why I always just return the link to the whole list of listings).
|
||||
* This is not only bad for us, but also bad for ppl with disabilities...
|
||||
*/
|
||||
|
||||
function normalize(o) {
|
||||
const size = o.size != null ? o.size.replace('Wohnfläche ', '') : 'N/A m²';
|
||||
const price = o.price.replace('Kaufpreis ', '');
|
||||
const address = o.address.split(' • ')[o.address.split(' • ').length - 1];
|
||||
const title = o.title || 'No title available';
|
||||
const link = o.id;
|
||||
const id = buildHash(o.id.substring(o.id.lastIndexOf('/') + 1, o.id.length), price);
|
||||
const link = config.url;
|
||||
const id = buildHash(title, price);
|
||||
return Object.assign(o, { id, address, price, size, title, link });
|
||||
}
|
||||
function applyBlacklist(o) {
|
||||
@@ -16,16 +24,16 @@ function applyBlacklist(o) {
|
||||
}
|
||||
const config = {
|
||||
url: null,
|
||||
crawlContainer: '.content-wrapper-tiles .ng-star-inserted',
|
||||
crawlContainer: 'div[data-testid="serp-core-classified-card-testid"]',
|
||||
sortByDateParam: 'sortby=19',
|
||||
waitForSelector: 'div[data-testid="serp-resultscount-testid"]',
|
||||
crawlFields: {
|
||||
id: '.card a@href',
|
||||
title: '.card h3 |trim',
|
||||
price: '.card .has-font-300 .is-bold | trim',
|
||||
size: '.card .has-font-300 .ml-100 | trim',
|
||||
address: '.card span:nth-child(2) | trim',
|
||||
id: 'button@title |trim', // immonet is a piece of sh*t. See comment above
|
||||
title: 'button@title |trim',
|
||||
price: 'div[data-testid="cardmfe-price-testid"] | trim',
|
||||
size: 'div[data-testid="cardmfe-keyfacts-testid"] | trim',
|
||||
address: 'div[data-testid="cardmfe-description-box-address"] | trim',
|
||||
},
|
||||
paginate: '#idResultList .margin-bottom-6.margin-bottom-sm-12 .panel a.pull-right@href',
|
||||
normalize: normalize,
|
||||
filter: applyBlacklist,
|
||||
};
|
||||
|
||||
@@ -17,6 +17,7 @@ const config = {
|
||||
url: null,
|
||||
crawlContainer: '#resultListItems li.result-list__listing',
|
||||
sortByDateParam: 'sorting=2',
|
||||
waitForSelector: 'body',
|
||||
crawlFields: {
|
||||
id: '.result-list-entry@data-obid | int',
|
||||
price: '.result-list-entry .result-list-entry__criteria .grid-item:first-child dd | removeNewline | trim',
|
||||
@@ -25,7 +26,6 @@ const config = {
|
||||
link: '.result-list-entry .result-list-entry__brand-title-container@href',
|
||||
address: '.result-list-entry .result-list-entry__map-link',
|
||||
},
|
||||
paginate: '#pager .align-right a@href',
|
||||
normalize: normalize,
|
||||
filter: applyBlacklist,
|
||||
};
|
||||
|
||||
@@ -1,48 +1,48 @@
|
||||
import utils, {buildHash} from '../utils.js';
|
||||
import utils, { buildHash } from '../utils.js';
|
||||
|
||||
let appliedBlackList = [];
|
||||
|
||||
function normalize(o) {
|
||||
const size = o.size || 'N/A m²';
|
||||
const price = (o.price || '--- €').replace('Preis auf Anfrage', '--- €');
|
||||
const title = o.title || 'No title available';
|
||||
const immoId = o.id.substring(o.id.indexOf('-') + 1, o.id.length);
|
||||
const link = `https://immo.swp.de/immobilien/${immoId}`;
|
||||
const description = o.description;
|
||||
const id = buildHash(immoId, price);
|
||||
return Object.assign(o, {id, price, size, title, link, description});
|
||||
const size = o.size || 'N/A m²';
|
||||
const price = (o.price || '--- €').replace('Preis auf Anfrage', '--- €');
|
||||
const title = o.title || 'No title available';
|
||||
const immoId = o.id.substring(o.id.indexOf('-') + 1, o.id.length);
|
||||
const link = `https://immo.swp.de/immobilien/${immoId}`;
|
||||
const description = o.description;
|
||||
const id = buildHash(immoId, price);
|
||||
return Object.assign(o, { id, price, size, title, link, description });
|
||||
}
|
||||
|
||||
function applyBlacklist(o) {
|
||||
const titleNotBlacklisted = !utils.isOneOf(o.title, appliedBlackList);
|
||||
const descNotBlacklisted = !utils.isOneOf(o.description, appliedBlackList);
|
||||
return titleNotBlacklisted && descNotBlacklisted;
|
||||
const titleNotBlacklisted = !utils.isOneOf(o.title, appliedBlackList);
|
||||
const descNotBlacklisted = !utils.isOneOf(o.description, appliedBlackList);
|
||||
return titleNotBlacklisted && descNotBlacklisted;
|
||||
}
|
||||
|
||||
const config = {
|
||||
url: null,
|
||||
crawlContainer: '.js-serp-item',
|
||||
sortByDateParam: 's=most_recently_updated_first',
|
||||
crawlFields: {
|
||||
id: '.js-bookmark-btn@data-id',
|
||||
price: 'div.align-items-start div:first-child | trim',
|
||||
size: 'div.align-items-start div:nth-child(3) | trim',
|
||||
title: '.card-title h2 | trim',
|
||||
link: '.ci-search-result__link@href',
|
||||
description: '.js-show-more-item-sm | removeNewline | trim',
|
||||
},
|
||||
paginate: 'li.page-item.pagination__item a.page-link@href',
|
||||
normalize: normalize,
|
||||
filter: applyBlacklist,
|
||||
url: null,
|
||||
crawlContainer: '.js-serp-item',
|
||||
sortByDateParam: 's=most_recently_updated_first',
|
||||
waitForSelector: 'body',
|
||||
crawlFields: {
|
||||
id: '.js-bookmark-btn@data-id',
|
||||
price: 'div.align-items-start div:first-child | trim',
|
||||
size: 'div.align-items-start div:nth-child(3) | trim',
|
||||
title: '.js-item-title-link@title | trim',
|
||||
link: '.ci-search-result__link@href',
|
||||
description: '.js-show-more-item-sm | removeNewline | trim',
|
||||
},
|
||||
normalize: normalize,
|
||||
filter: applyBlacklist,
|
||||
};
|
||||
export const init = (sourceConfig, blacklist) => {
|
||||
config.enabled = sourceConfig.enabled;
|
||||
config.url = sourceConfig.url;
|
||||
appliedBlackList = blacklist || [];
|
||||
config.enabled = sourceConfig.enabled;
|
||||
config.url = sourceConfig.url;
|
||||
appliedBlackList = blacklist || [];
|
||||
};
|
||||
export const metaInformation = {
|
||||
name: 'Immo Südwest Presse',
|
||||
baseUrl: 'https://immo.swp.de/',
|
||||
id: 'immoswp',
|
||||
name: 'Immo Südwest Presse',
|
||||
baseUrl: 'https://immo.swp.de/',
|
||||
id: 'immoswp',
|
||||
};
|
||||
export {config};
|
||||
export { config };
|
||||
|
||||
@@ -16,17 +16,17 @@ function applyBlacklist(o) {
|
||||
const config = {
|
||||
url: null,
|
||||
crawlContainer:
|
||||
'div[data-testid="serp-card-testid"]:not(div[data-testid="serp-enlargementlist-testid"] div[data-testid="serp-card-testid"])',
|
||||
'div[data-testid="serp-core-scrollablelistview-testid"]:not(div[data-testid="serp-enlargementlist-testid"] div[data-testid="serp-card-testid"]) div[data-testid="serp-core-classified-card-testid"]',
|
||||
sortByDateParam: 'order=DateDesc',
|
||||
waitForSelector: 'div[data-testid="serp-gridcontainer-testid"]',
|
||||
crawlFields: {
|
||||
id: 'a@id',
|
||||
id: 'a@href',
|
||||
price: 'div[data-testid="cardmfe-price-testid"] | removeNewline | trim',
|
||||
size: 'div[data-testid="cardmfe-keyfacts-testid"] | removeNewline | trim',
|
||||
title: '.css-1cbj9xw',
|
||||
link: 'a@href',
|
||||
address: 'div[data-testid="cardmfe-description-box-address"] | removeNewline | trim',
|
||||
},
|
||||
paginate: '#pnlPaging #nlbPlus@href',
|
||||
normalize: normalize,
|
||||
filter: applyBlacklist,
|
||||
};
|
||||
|
||||
@@ -6,7 +6,8 @@ let appliedBlacklistedDistricts = [];
|
||||
function normalize(o) {
|
||||
const size = o.size || '--- m²';
|
||||
const id = buildHash(o.id, o.price);
|
||||
return Object.assign(o, {id, size});
|
||||
const link = `https://www.kleinanzeigen.de${o.link}`;
|
||||
return Object.assign(o, {id, size, link});
|
||||
}
|
||||
|
||||
function applyBlacklist(o) {
|
||||
@@ -14,7 +15,7 @@ function applyBlacklist(o) {
|
||||
const descNotBlacklisted = !utils.isOneOf(o.description, appliedBlackList);
|
||||
const isBlacklistedDistrict =
|
||||
appliedBlacklistedDistricts.length === 0 ? false : utils.isOneOf(o.description, appliedBlacklistedDistricts);
|
||||
return !isBlacklistedDistrict && titleNotBlacklisted && descNotBlacklisted;
|
||||
return o.title != null && !isBlacklistedDistrict && titleNotBlacklisted && descNotBlacklisted;
|
||||
}
|
||||
|
||||
const config = {
|
||||
@@ -22,16 +23,16 @@ const config = {
|
||||
crawlContainer: '#srchrslt-adtable .ad-listitem ',
|
||||
//sort by date is standard oO
|
||||
sortByDateParam: null,
|
||||
waitForSelector: 'body',
|
||||
crawlFields: {
|
||||
id: '.aditem@data-adid | int',
|
||||
price: '.aditem-main--middle--price-shipping--price | removeNewline | trim',
|
||||
size: '.aditem-main .text-module-end span:nth-child(2) | removeNewline | trim',
|
||||
size: '.aditem-main .text-module-end | removeNewline | trim',
|
||||
title: '.aditem-main .text-module-begin a | removeNewline | trim',
|
||||
link: '.aditem-main .text-module-begin a@href | removeNewline | trim',
|
||||
description: '.aditem-main p:not(.text-module-end) | removeNewline | trim',
|
||||
description: '.aditem-main .aditem-main--middle--description | removeNewline | trim',
|
||||
address: '.aditem-main--top--left | trim | removeNewline',
|
||||
},
|
||||
paginate: '#srchrslt-pagination .pagination-next@href',
|
||||
normalize: normalize,
|
||||
filter: applyBlacklist,
|
||||
};
|
||||
|
||||
@@ -8,7 +8,7 @@ function nullOrEmpty(val) {
|
||||
|
||||
function normalize(o) {
|
||||
const link = nullOrEmpty(o.link) ? 'NO LINK' : `https://www.neubaukompass.de${o.link.substring(o.link.indexOf('/neubau'))}`;
|
||||
const id = buildHash(o.id, o.price);
|
||||
const id = buildHash(o.link, o.price);
|
||||
return Object.assign(o, {id, link});
|
||||
}
|
||||
|
||||
@@ -18,16 +18,16 @@ function applyBlacklist(o) {
|
||||
|
||||
const config = {
|
||||
url: null,
|
||||
crawlContainer: '.nbk-container >div article',
|
||||
crawlContainer: '.col-12.mb-4',
|
||||
sortByDateParam: 'Sortierung=Id&Richtung=DESC',
|
||||
waitForSelector: '.nbk-section',
|
||||
crawlFields: {
|
||||
id: '@id',
|
||||
title: 'a.nbk-truncate@title | removeNewline | trim',
|
||||
link: 'a.nbk-truncate@href',
|
||||
address: 'p.nbk-truncate | removeNewline | trim',
|
||||
price: 'p.nbk-mb-0 | removeNewline | trim',
|
||||
id: 'a@href',
|
||||
title: 'a@title | removeNewline | trim',
|
||||
link: 'a@href',
|
||||
address: '.nbk-project-card__description | removeNewline | trim',
|
||||
price: '.nbk-project-card__spec-item .nbk-project-card__spec-value | removeNewline | trim',
|
||||
},
|
||||
paginate: '.numbered-pager__bottom .numbered-pager--info li:nth-child(2) a@href',
|
||||
normalize: normalize,
|
||||
filter: applyBlacklist,
|
||||
};
|
||||
|
||||
@@ -4,7 +4,8 @@ let appliedBlackList = [];
|
||||
|
||||
function normalize(o) {
|
||||
const id = buildHash(o.id, o.price);
|
||||
return Object.assign(o, {id});
|
||||
const link = `https://www.wg-gesucht.de${o.link}`;
|
||||
return Object.assign(o, { id, link });
|
||||
}
|
||||
|
||||
function applyBlacklist(o) {
|
||||
@@ -17,6 +18,7 @@ const config = {
|
||||
url: null,
|
||||
crawlContainer: '#main_column .wgg_card',
|
||||
sortByDateParam: 'sort_column=0&sort_order=0',
|
||||
waitForSelector: 'body',
|
||||
crawlFields: {
|
||||
id: '@data-id',
|
||||
details: '.row .noprint .col-xs-11 |removeNewline |trim',
|
||||
|
||||
43
lib/services/extractor/extractor.js
Normal file
43
lib/services/extractor/extractor.js
Normal file
@@ -0,0 +1,43 @@
|
||||
import { setDebug } from './utils.js';
|
||||
import puppeteerExtractor from './puppeteerExtractor.js';
|
||||
import { loadParser, parse } from './parser/parser.js';
|
||||
|
||||
const DEFAULT_OPTIONS = {
|
||||
debug: false,
|
||||
puppeteerTimeout: 60_000,
|
||||
puppeteerHeadless: true,
|
||||
};
|
||||
|
||||
export default class Extractor {
|
||||
constructor(options) {
|
||||
this.options = {
|
||||
...DEFAULT_OPTIONS,
|
||||
...options,
|
||||
};
|
||||
this.responseText = null;
|
||||
setDebug(this.options);
|
||||
}
|
||||
|
||||
/**
|
||||
* if you are extracting data from a SPA, you must provide a selector, otherwise
|
||||
* your response will never contain what you are really looking for
|
||||
* @param url
|
||||
* @param waitForSelector
|
||||
*/
|
||||
execute = async (url, waitForSelector = null) => {
|
||||
this.responseText = null;
|
||||
try {
|
||||
this.responseText = await puppeteerExtractor(url, waitForSelector, this.options);
|
||||
if (this.responseText != null) {
|
||||
loadParser(this.responseText);
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('Error trying to load page.', error);
|
||||
}
|
||||
return this;
|
||||
};
|
||||
|
||||
parseResponseText = (crawlContainer, crawlFields, url) => {
|
||||
return parse(crawlContainer, crawlFields, this.responseText, url);
|
||||
};
|
||||
}
|
||||
97
lib/services/extractor/parser/parser.js
Normal file
97
lib/services/extractor/parser/parser.js
Normal file
@@ -0,0 +1,97 @@
|
||||
import * as cheerio from 'cheerio';
|
||||
|
||||
let $ = null;
|
||||
|
||||
export function loadParser(text) {
|
||||
$ = cheerio.load(text);
|
||||
}
|
||||
|
||||
export function parse(crawlContainer, crawlFields, text, url) {
|
||||
if (!text) {
|
||||
console.warn('Cannot parse, text was empty for url ', url);
|
||||
return null;
|
||||
}
|
||||
|
||||
if (!crawlContainer || !crawlFields) {
|
||||
console.warn('Cannot parse, selector was empty for url ', url);
|
||||
return null;
|
||||
}
|
||||
|
||||
const result = [];
|
||||
|
||||
if ($(crawlContainer).length === 0) {
|
||||
console.warn('No elements in crawl container found for url ', url);
|
||||
return null;
|
||||
}
|
||||
|
||||
$(crawlContainer).each((_, element) => {
|
||||
const container = $(element);
|
||||
const parsedObject = {};
|
||||
|
||||
// Parse fields based on crawlFields
|
||||
for (const [key, fieldSelector] of Object.entries(crawlFields)) {
|
||||
let value;
|
||||
|
||||
try {
|
||||
const selector = fieldSelector.includes('|')
|
||||
? fieldSelector.substring(0, fieldSelector.indexOf('|')).trim()
|
||||
: fieldSelector;
|
||||
|
||||
if (selector.includes('@')) {
|
||||
const [sel, attr] = selector.split('@');
|
||||
if (sel.length === 0) {
|
||||
value = container.attr(attr.trim());
|
||||
} else {
|
||||
value = container.find(sel.trim()).attr(attr.trim());
|
||||
}
|
||||
} else {
|
||||
value = container.find(selector.trim()).text();
|
||||
}
|
||||
|
||||
// Apply modifiers if specified
|
||||
if (fieldSelector.includes('|')) {
|
||||
/* eslint-disable no-unused-vars */
|
||||
const [_, ...modifiers] = fieldSelector.split('|').map((s) => s.trim());
|
||||
/* eslint-disable no-unused-vars */
|
||||
value = applyModifiers(value, modifiers);
|
||||
}
|
||||
|
||||
parsedObject[key] = value || null;
|
||||
} catch (error) {
|
||||
console.error(`Error parsing field '${key}' with selector '${fieldSelector}':`, error);
|
||||
parsedObject[key] = null;
|
||||
}
|
||||
}
|
||||
|
||||
if (parsedObject.id != null) {
|
||||
result.push(parsedObject);
|
||||
} else {
|
||||
console.warn('ID not found. Not relaying object.');
|
||||
}
|
||||
});
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
// Helper function to apply modifiers
|
||||
function applyModifiers(value, modifiers) {
|
||||
if (!value) return value;
|
||||
|
||||
modifiers.forEach((modifier) => {
|
||||
switch (modifier) {
|
||||
case 'int':
|
||||
value = parseInt(value, 10);
|
||||
break;
|
||||
case 'trim':
|
||||
value = value.replace(/\s+/g, ' ').trim();
|
||||
break;
|
||||
case 'removeNewline':
|
||||
value = value.replace(/\n/g, ' ');
|
||||
break;
|
||||
default:
|
||||
console.warn(`Unknown modifier: ${modifier}`);
|
||||
}
|
||||
});
|
||||
|
||||
return value;
|
||||
}
|
||||
49
lib/services/extractor/puppeteerExtractor.js
Normal file
49
lib/services/extractor/puppeteerExtractor.js
Normal file
@@ -0,0 +1,49 @@
|
||||
import puppeteer from 'puppeteer-extra';
|
||||
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
|
||||
import { debug, DEFAULT_HEADER, botDetected } from './utils.js';
|
||||
|
||||
puppeteer.use(StealthPlugin());
|
||||
|
||||
export default async function execute(url, waitForSelector, options) {
|
||||
let browser;
|
||||
try {
|
||||
debug(`Sending request to ${url} using Puppeteer.`);
|
||||
|
||||
browser = await puppeteer.launch({
|
||||
headless: options.puppeteerHeadless ?? true,
|
||||
args: ['--no-sandbox', '--disable-gpu', '--disable-setuid-sandbox'],
|
||||
timeout: options.puppeteerTimeout || 30_000,
|
||||
});
|
||||
let page = await browser.newPage();
|
||||
await page.setExtraHTTPHeaders(DEFAULT_HEADER);
|
||||
const response = await page.goto(url, {
|
||||
waitUntil: 'domcontentloaded',
|
||||
});
|
||||
let pageSource;
|
||||
//if we're extracting data from a spa, we must wait for the selector
|
||||
if (waitForSelector != null) {
|
||||
await page.waitForSelector(waitForSelector);
|
||||
pageSource = await page.evaluate((selector) => {
|
||||
return document.querySelector(selector).innerHTML;
|
||||
}, waitForSelector);
|
||||
} else {
|
||||
pageSource = await page.content();
|
||||
}
|
||||
|
||||
const statusCode = response.status();
|
||||
|
||||
if (botDetected(pageSource, statusCode)) {
|
||||
console.warn('We have been detected as a bot :-/ Tried url: => ', url);
|
||||
return null;
|
||||
}
|
||||
|
||||
return await page.content();
|
||||
} catch (error) {
|
||||
console.error('Error executing with puppeteer executor', error);
|
||||
return null;
|
||||
} finally {
|
||||
if (browser != null) {
|
||||
await browser.close();
|
||||
}
|
||||
}
|
||||
}
|
||||
32
lib/services/extractor/utils.js
Normal file
32
lib/services/extractor/utils.js
Normal file
@@ -0,0 +1,32 @@
|
||||
let debuggingOn = false;
|
||||
|
||||
export const DEFAULT_HEADER = {
|
||||
Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
|
||||
'Accept-Language': 'en-US,en;q=0.5',
|
||||
Connection: 'keep-alive',
|
||||
'Upgrade-Insecure-Requests': '1',
|
||||
'User-Agent':
|
||||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
|
||||
};
|
||||
|
||||
export const setDebug = (options) => {
|
||||
debuggingOn = !!options?.debug;
|
||||
};
|
||||
|
||||
export const debug = (message) => {
|
||||
if (debuggingOn) {
|
||||
/* eslint-disable no-console */
|
||||
console.debug(message);
|
||||
/* eslint-enable no-console */
|
||||
}
|
||||
};
|
||||
|
||||
export const botDetected = (pageSource, statusCode) => {
|
||||
const suspiciousStatusCodes = [403, 429];
|
||||
const botDetectionPatterns = [/verify you are human/i, /access denied/i, /x-amz-cf-id/i];
|
||||
|
||||
const detectedInSource = botDetectionPatterns.some((pattern) => pattern.test(pageSource));
|
||||
const detectedByStatus = suspiciousStatusCodes.includes(statusCode);
|
||||
|
||||
return detectedInSource || detectedByStatus;
|
||||
};
|
||||
@@ -1,77 +0,0 @@
|
||||
import fetch from 'node-fetch';
|
||||
import { config } from '../utils.js';
|
||||
import { makeUrlResidential } from './scrapingAnt.js';
|
||||
import https from 'https';
|
||||
//if ScrapingAnt got blocked, this http status is returned
|
||||
const BLOCKED_HTTP_STATUS = 423;
|
||||
const NOT_FOUND_HTTP_STATUS = 404;
|
||||
const MAX_RETRIES_SCRAPING_ANT = 10;
|
||||
const EXPECTED_STATUS_CODES = [BLOCKED_HTTP_STATUS, NOT_FOUND_HTTP_STATUS];
|
||||
const agent = new https.Agent({
|
||||
rejectUnauthorized: false,
|
||||
});
|
||||
|
||||
function makeDriver(headers = {}) {
|
||||
let cookies = '';
|
||||
async function scrapingAntDriver(context, callback, retryCounter = 0) {
|
||||
const proxyType = config.scrapingAnt?.proxy || 'datacenter';
|
||||
try {
|
||||
const url = proxyType === 'residential' ? makeUrlResidential(context.url) : context.url;
|
||||
const response = await fetch(url, {
|
||||
headers: {
|
||||
...headers,
|
||||
cookie: cookies,
|
||||
},
|
||||
});
|
||||
const result = await response.text();
|
||||
if (EXPECTED_STATUS_CODES.includes(response.status)) {
|
||||
throw new Error(`${response.status}`);
|
||||
}
|
||||
if (cookies.length === 0) {
|
||||
cookies = response.headers.raw()['set-cookie'] || [];
|
||||
}
|
||||
callback(null, result);
|
||||
} catch (exception) {
|
||||
/* eslint-disable no-console */
|
||||
if (!EXPECTED_STATUS_CODES.includes(exception.response?.status) && !EXPECTED_STATUS_CODES.includes(Number(exception.message))) {
|
||||
console.error(`Error while trying to scrape data from scraping ant. Received error: ${exception.message}`);
|
||||
callback(null, []);
|
||||
return;
|
||||
}
|
||||
if (retryCounter <= MAX_RETRIES_SCRAPING_ANT) {
|
||||
retryCounter++;
|
||||
console.debug(`ScrapingAnt got blocked. Retrying ${retryCounter} / ${MAX_RETRIES_SCRAPING_ANT}`);
|
||||
await scrapingAntDriver(context, callback, retryCounter);
|
||||
} else {
|
||||
console.error(`Error while trying to scrape data from scraping ant. Received error: ${exception.message}`);
|
||||
callback(null, []);
|
||||
}
|
||||
/* eslint-enable no-console */
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* The regular request driver is taking care of everyting, that doesn't need to be scraped by ScrapingAnt (which is
|
||||
* everything != Immoscout & Immonet as of writing this)
|
||||
*/
|
||||
return async function driver(context, callback) {
|
||||
if (context.url.toLowerCase().indexOf('scrapingant') !== -1) {
|
||||
return scrapingAntDriver(context, callback);
|
||||
}
|
||||
try {
|
||||
const response = await fetch(context.url, {
|
||||
headers: {
|
||||
...headers,
|
||||
Cookie: cookies,
|
||||
},
|
||||
agent,
|
||||
});
|
||||
const result = await response.text();
|
||||
callback(null, result);
|
||||
} catch (exception) {
|
||||
console.error(`Error while trying to scrape data. Received error: ${exception.message}`);
|
||||
callback(null, []);
|
||||
}
|
||||
};
|
||||
}
|
||||
export default makeDriver;
|
||||
@@ -1,36 +0,0 @@
|
||||
import { config } from '../utils.js';
|
||||
import makeDriver from './requestDriver.js';
|
||||
import Xray from 'x-ray';
|
||||
class Scraper {
|
||||
constructor() {
|
||||
const filters = {
|
||||
removeNewline: this._removeNewline,
|
||||
trim: this._trim,
|
||||
int: this._int,
|
||||
};
|
||||
const headers = {
|
||||
'User-Agent':
|
||||
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36',
|
||||
};
|
||||
if (config.scrapingAnt != null && config.scrapingAnt.apiKey != null) {
|
||||
headers['x-api-key'] = config.scrapingAnt.apiKey;
|
||||
}
|
||||
const driver = makeDriver(headers);
|
||||
const xray = Xray({ filters });
|
||||
xray.driver(driver);
|
||||
this.xray = xray;
|
||||
}
|
||||
get x() {
|
||||
return this.xray;
|
||||
}
|
||||
_removeNewline(value) {
|
||||
return typeof value === 'string' ? value.replace(/\\n/g, '') : value;
|
||||
}
|
||||
_trim(value) {
|
||||
return typeof value === 'string' ? value.replace(/\s+/g, ' ').trim() : value;
|
||||
}
|
||||
_int(value) {
|
||||
return typeof value === 'string' ? parseInt(value, 10) : value;
|
||||
}
|
||||
}
|
||||
export default new Scraper().x;
|
||||
@@ -1,30 +0,0 @@
|
||||
import { metaInformation as immoScoutInfo } from '../provider/immoscout.js';
|
||||
import { metaInformation as immoNetInfo } from '../provider/immonet.js';
|
||||
import { metaInformation as neuBauCompassInfo } from '../provider/neubauKompass.js';
|
||||
import { config } from '../utils.js';
|
||||
|
||||
const additionalImmonetUrlParams = `&wait_for_selector=.content-wrapper-tiles&js_snippet=${Buffer.from(
|
||||
'window.scrollTo(0,document.body.scrollHeight);'
|
||||
).toString('base64')}`;
|
||||
|
||||
const needScrapingAnt = (id) => {
|
||||
return id.toLowerCase() === immoScoutInfo.id || id.toLowerCase() === immoNetInfo.id || id.toLowerCase() === neuBauCompassInfo.id.toLowerCase();
|
||||
};
|
||||
export const transformUrlForScrapingAnt = (url, id) => {
|
||||
let urlParams = '';
|
||||
if (needScrapingAnt(id)) {
|
||||
if (id.toLowerCase() === immoNetInfo.id) {
|
||||
urlParams = additionalImmonetUrlParams;
|
||||
}
|
||||
//only do calls to scrapingAnt when dealing with Immoscout/Immonet
|
||||
url = `https://api.scrapingant.com/v2/general?url=${encodeURIComponent(url)}&proxy_type=datacenter${urlParams}`;
|
||||
}
|
||||
return url;
|
||||
};
|
||||
export const isScrapingAntApiKeySet = () => {
|
||||
return config.scrapingAnt != null && config.scrapingAnt.apiKey != null && config.scrapingAnt.apiKey.length > 8;
|
||||
};
|
||||
export const makeUrlResidential = (url) => {
|
||||
return url.replace('datacenter', 'residential');
|
||||
};
|
||||
export { needScrapingAnt };
|
||||
@@ -1,72 +1,90 @@
|
||||
import Mixpanel from 'mixpanel';
|
||||
import { getJobs } from '../storage/jobStorage.js';
|
||||
import { getUniqueId } from './uniqueId.js';
|
||||
import { config, inDevMode } from '../../utils.js';
|
||||
import {getJobs} from '../storage/jobStorage.js';
|
||||
import {getUniqueId} from './uniqueId.js';
|
||||
import {config, inDevMode} from '../../utils.js';
|
||||
import os from 'os';
|
||||
import {readFileSync} from 'fs';
|
||||
import {packageUp} from 'package-up';
|
||||
|
||||
const mixpanelTracker = Mixpanel.init('718670ef1c58c0208256c1e408a3d75e');
|
||||
|
||||
const distinct_id = getUniqueId() || 'N/A';
|
||||
const version = await getPackageVersion();
|
||||
|
||||
export const track = function () {
|
||||
//only send tracking information if the user allowed to do so.
|
||||
if (config.analyticsEnabled && !inDevMode()) {
|
||||
const activeProvider = new Set();
|
||||
const activeAdapter = new Set();
|
||||
//only send tracking information if the user allowed to do so.
|
||||
if (config.analyticsEnabled && !inDevMode()) {
|
||||
const activeProvider = new Set();
|
||||
const activeAdapter = new Set();
|
||||
|
||||
const jobs = getJobs();
|
||||
const jobs = getJobs();
|
||||
|
||||
if (jobs != null && jobs.length > 0) {
|
||||
jobs.forEach((job) => {
|
||||
job.provider.forEach((provider) => {
|
||||
activeProvider.add(provider.id);
|
||||
});
|
||||
job.notificationAdapter.forEach((adapter) => {
|
||||
activeAdapter.add(adapter.id);
|
||||
});
|
||||
});
|
||||
if (jobs != null && jobs.length > 0) {
|
||||
jobs.forEach((job) => {
|
||||
job.provider.forEach((provider) => {
|
||||
activeProvider.add(provider.id);
|
||||
});
|
||||
job.notificationAdapter.forEach((adapter) => {
|
||||
activeAdapter.add(adapter.id);
|
||||
});
|
||||
});
|
||||
|
||||
mixpanelTracker.track(
|
||||
'fredy_tracking',
|
||||
enrichTrackingObject({
|
||||
adapter: Array.from(activeAdapter),
|
||||
provider: Array.from(activeProvider),
|
||||
}),
|
||||
);
|
||||
mixpanelTracker.track(
|
||||
'fredy_tracking',
|
||||
enrichTrackingObject({
|
||||
adapter: Array.from(activeAdapter),
|
||||
provider: Array.from(activeProvider),
|
||||
}),
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
/**
|
||||
* Note, this will only be used when Fredy runs in demo mode
|
||||
*/
|
||||
export function trackDemoJobCreated(jobData) {
|
||||
if (config.analyticsEnabled && !inDevMode() && config.demoMode) {
|
||||
mixpanelTracker.track('demoJobCreated', enrichTrackingObject(jobData));
|
||||
}
|
||||
if (config.analyticsEnabled && !inDevMode() && config.demoMode) {
|
||||
mixpanelTracker.track('demoJobCreated', enrichTrackingObject(jobData));
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Note, this will only be used when Fredy runs in demo mode
|
||||
*/
|
||||
export function trackDemoAccessed() {
|
||||
if (config.analyticsEnabled && !inDevMode() && config.demoMode) {
|
||||
mixpanelTracker.track('demoAccessed', enrichTrackingObject({}));
|
||||
}
|
||||
if (config.analyticsEnabled && !inDevMode() && config.demoMode) {
|
||||
mixpanelTracker.track('demoAccessed', enrichTrackingObject({}));
|
||||
}
|
||||
}
|
||||
|
||||
function enrichTrackingObject(trackingObject) {
|
||||
const platform = process.platform;
|
||||
const arch = process.arch;
|
||||
const language = process.env.LANG || 'en';
|
||||
const nodeVersion = process.version || 'N/A';
|
||||
const operating_system = os.platform();
|
||||
const os_version = os.release();
|
||||
const arch = process.arch;
|
||||
const language = process.env.LANG || 'en';
|
||||
const nodeVersion = process.version || 'N/A';
|
||||
|
||||
return {
|
||||
...trackingObject,
|
||||
isDemo: config.demoMode,
|
||||
platform,
|
||||
arch,
|
||||
nodeVersion,
|
||||
language,
|
||||
distinct_id,
|
||||
};
|
||||
return {
|
||||
...trackingObject,
|
||||
isDemo: config.demoMode,
|
||||
operating_system,
|
||||
os_version,
|
||||
arch,
|
||||
nodeVersion,
|
||||
language,
|
||||
distinct_id,
|
||||
fredy_version: version
|
||||
};
|
||||
}
|
||||
|
||||
async function getPackageVersion() {
|
||||
try {
|
||||
const packagePath = await packageUp();
|
||||
const packageJson = readFileSync(packagePath, 'utf8');
|
||||
const json = JSON.parse(packageJson);
|
||||
return json.version;
|
||||
} catch (error) {
|
||||
console.error('Error reading version from package.json', error);
|
||||
}
|
||||
return 'N/A';
|
||||
}
|
||||
|
||||
40
package.json
40
package.json
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "fredy",
|
||||
"version": "10.4.1",
|
||||
"version": "11.0.5",
|
||||
"description": "[F]ind [R]eal [E]states [d]amn eas[y].",
|
||||
"scripts": {
|
||||
"start": "node prod.js",
|
||||
@@ -50,28 +50,33 @@
|
||||
"Firefox ESR"
|
||||
],
|
||||
"dependencies": {
|
||||
"@douyinfe/semi-ui": "2.70.1",
|
||||
"@douyinfe/semi-ui": "2.75.0",
|
||||
"@rematch/core": "2.2.0",
|
||||
"@rematch/loading": "2.1.2",
|
||||
"@sendgrid/mail": "8.1.4",
|
||||
"@vitejs/plugin-react": "4.3.4",
|
||||
"better-sqlite3": "^11.6.0",
|
||||
"better-sqlite3": "^11.8.1",
|
||||
"body-parser": "1.20.3",
|
||||
"cheerio": "^1.0.0",
|
||||
"cookie-session": "2.1.0",
|
||||
"handlebars": "4.7.8",
|
||||
"highcharts": "12.0.1",
|
||||
"highcharts": "12.1.2",
|
||||
"highcharts-react-official": "3.2.1",
|
||||
"lodash": "4.17.21",
|
||||
"lowdb": "6.0.1",
|
||||
"markdown": "^0.5.0",
|
||||
"mixpanel": "^0.18.0",
|
||||
"nanoid": "5.0.9",
|
||||
"nanoid": "5.1.2",
|
||||
"node-fetch": "3.3.2",
|
||||
"node-mailjet": "6.0.6",
|
||||
"package-up": "^5.0.0",
|
||||
"puppeteer": "^24.2.1",
|
||||
"puppeteer-extra": "^3.3.6",
|
||||
"puppeteer-extra-plugin-stealth": "^2.11.2",
|
||||
"query-string": "9.1.1",
|
||||
"react": "18.3.1",
|
||||
"react-dom": "18.3.1",
|
||||
"react-redux": "9.1.2",
|
||||
"react-redux": "9.2.0",
|
||||
"react-router": "5.2.1",
|
||||
"react-router-dom": "5.3.0",
|
||||
"redux": "5.0.1",
|
||||
@@ -80,25 +85,24 @@
|
||||
"serve-static": "1.16.2",
|
||||
"slack": "11.0.2",
|
||||
"string-similarity": "^4.0.4",
|
||||
"vite": "5.4.11",
|
||||
"x-ray": "2.3.4"
|
||||
"vite": "5.4.11"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@babel/core": "7.26.0",
|
||||
"@babel/eslint-parser": "7.25.9",
|
||||
"@babel/preset-env": "7.26.0",
|
||||
"@babel/preset-react": "7.25.9",
|
||||
"chai": "5.1.2",
|
||||
"@babel/core": "7.26.9",
|
||||
"@babel/eslint-parser": "7.26.8",
|
||||
"@babel/preset-env": "7.26.9",
|
||||
"@babel/preset-react": "7.26.3",
|
||||
"chai": "5.2.0",
|
||||
"eslint": "8.56.0",
|
||||
"eslint-config-prettier": "8.8.0",
|
||||
"eslint-plugin-react": "7.37.2",
|
||||
"esmock": "2.6.9",
|
||||
"eslint-plugin-react": "7.37.4",
|
||||
"esmock": "2.7.0",
|
||||
"history": "5.3.0",
|
||||
"husky": "9.1.7",
|
||||
"less": "4.2.1",
|
||||
"lint-staged": "15.2.10",
|
||||
"less": "4.2.2",
|
||||
"lint-staged": "15.4.3",
|
||||
"mocha": "10.8.2",
|
||||
"prettier": "3.3.3",
|
||||
"prettier": "3.5.2",
|
||||
"redux-logger": "3.0.6"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,48 +1,38 @@
|
||||
import * as similarityCache from '../../lib/services/similarity-check/similarityCache.js';
|
||||
import { get } from '../mocks/mockNotification.js';
|
||||
import { mockFredy, providerConfig } from '../utils.js';
|
||||
import { expect } from 'chai';
|
||||
import {get} from '../mocks/mockNotification.js';
|
||||
import {mockFredy, providerConfig} from '../utils.js';
|
||||
import {expect} from 'chai';
|
||||
import * as provider from '../../lib/provider/immonet.js';
|
||||
import * as scrapingAnt from '../../lib/services/scrapingAnt.js';
|
||||
|
||||
describe('#immonet testsuite()', () => {
|
||||
after(() => {
|
||||
similarityCache.stopCacheCleanup();
|
||||
});
|
||||
provider.init(providerConfig.immonet, [], []);
|
||||
it('should test immonet provider', async () => {
|
||||
const Fredy = await mockFredy();
|
||||
return await new Promise((resolve) => {
|
||||
if (!scrapingAnt.isScrapingAntApiKeySet()) {
|
||||
/* eslint-disable no-console */
|
||||
console.info('Skipping Immonet test as ScrapingAnt Api Key is not set.');
|
||||
/* eslint-enable no-console */
|
||||
resolve();
|
||||
return;
|
||||
}
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'immonet', similarityCache);
|
||||
fredy.execute().then((listing) => {
|
||||
expect(listing).to.be.a('array');
|
||||
const notificationObj = get();
|
||||
expect(notificationObj).to.be.a('object');
|
||||
expect(notificationObj.serviceName).to.equal('immonet');
|
||||
notificationObj.payload.forEach((notify) => {
|
||||
/** check the actual structure **/
|
||||
expect(notify.id).to.be.a('string');
|
||||
expect(notify.price).to.be.a('string');
|
||||
expect(notify.size).to.be.a('string');
|
||||
expect(notify.title).to.be.a('string');
|
||||
expect(notify.link).to.be.a('string');
|
||||
expect(notify.address).to.be.a('string');
|
||||
|
||||
/** check the values if possible **/
|
||||
expect(notify.price).that.does.include('€');
|
||||
expect(notify.size).that.does.include('m²');
|
||||
expect(notify.title).to.be.not.empty;
|
||||
expect(notify.address).to.be.not.empty;
|
||||
});
|
||||
resolve();
|
||||
});
|
||||
after(() => {
|
||||
similarityCache.stopCacheCleanup();
|
||||
});
|
||||
provider.init(providerConfig.immonet, [], []);
|
||||
it('should test immonet provider', async () => {
|
||||
const Fredy = await mockFredy();
|
||||
return await new Promise((resolve) => {
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'immonet', similarityCache);
|
||||
fredy.execute().then((listing) => {
|
||||
expect(listing).to.be.a('array');
|
||||
const notificationObj = get();
|
||||
expect(notificationObj).to.be.a('object');
|
||||
expect(notificationObj.serviceName).to.equal('immonet');
|
||||
notificationObj.payload.forEach((notify) => {
|
||||
/** check the actual structure **/
|
||||
expect(notify.id).to.be.a('string');
|
||||
expect(notify.price).to.be.a('string');
|
||||
expect(notify.size).to.be.a('string');
|
||||
expect(notify.title).to.be.a('string');
|
||||
expect(notify.link).to.be.a('string');
|
||||
expect(notify.address).to.be.a('string');
|
||||
|
||||
expect(notify.size).that.does.include('m²');
|
||||
expect(notify.title).to.be.not.empty;
|
||||
expect(notify.address).to.be.not.empty;
|
||||
});
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
@@ -1,48 +1,43 @@
|
||||
import * as similarityCache from '../../lib/services/similarity-check/similarityCache.js';
|
||||
import { get } from '../mocks/mockNotification.js';
|
||||
import { mockFredy, providerConfig } from '../utils.js';
|
||||
import { expect } from 'chai';
|
||||
//import {get} from '../mocks/mockNotification.js';
|
||||
import {/*mockFredy, */providerConfig} from '../utils.js';
|
||||
//import {expect} from 'chai';
|
||||
import * as provider from '../../lib/provider/immoscout.js';
|
||||
import * as scrapingAnt from '../../lib/services/scrapingAnt.js';
|
||||
|
||||
describe('#immoscout testsuite()', () => {
|
||||
after(() => {
|
||||
similarityCache.stopCacheCleanup();
|
||||
});
|
||||
provider.init(providerConfig.immoscout, [], []);
|
||||
it('should test immoscout provider', async () => {
|
||||
const Fredy = await mockFredy();
|
||||
return await new Promise((resolve) => {
|
||||
if (!scrapingAnt.isScrapingAntApiKeySet()) {
|
||||
/* eslint-disable no-console */
|
||||
console.info('Skipping Immoscout test as ScrapingAnt Api Key is not set.');
|
||||
/* eslint-enable no-console */
|
||||
resolve();
|
||||
return;
|
||||
}
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'immoscout', similarityCache);
|
||||
fredy.execute().then((listing) => {
|
||||
expect(listing).to.be.a('array');
|
||||
const notificationObj = get();
|
||||
expect(notificationObj).to.be.a('object');
|
||||
expect(notificationObj.serviceName).to.equal('immoscout');
|
||||
notificationObj.payload.forEach((notify) => {
|
||||
/** check the actual structure **/
|
||||
expect(notify.id).to.be.a('number');
|
||||
expect(notify.price).to.be.a('string');
|
||||
expect(notify.size).to.be.a('string');
|
||||
expect(notify.title).to.be.a('string');
|
||||
expect(notify.link).to.be.a('string');
|
||||
expect(notify.address).to.be.a('string');
|
||||
/** check the values if possible **/
|
||||
expect(notify.price).that.does.include('€');
|
||||
expect(notify.size).that.does.include('m²');
|
||||
expect(notify.title).to.be.not.empty;
|
||||
expect(notify.link).that.does.include('https://www.immobilienscout24.de');
|
||||
expect(notify.address).to.be.not.empty;
|
||||
});
|
||||
resolve();
|
||||
});
|
||||
after(() => {
|
||||
similarityCache.stopCacheCleanup();
|
||||
});
|
||||
provider.init(providerConfig.immoscout, [], []);
|
||||
it('should test immoscout provider', async () => {
|
||||
//const Fredy = await mockFredy();
|
||||
return await new Promise((resolve) => {
|
||||
/* eslint-disable no-console */
|
||||
console.info('Skipping Immoscout test for now until we figured out how to surpass bot detection.');
|
||||
/* eslint-enable no-console */
|
||||
resolve();
|
||||
/*
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'immoscout', similarityCache);
|
||||
fredy.execute().then((listing) => {
|
||||
expect(listing).to.be.a('array');
|
||||
const notificationObj = get();
|
||||
expect(notificationObj).to.be.a('object');
|
||||
expect(notificationObj.serviceName).to.equal('immoscout');
|
||||
notificationObj.payload.forEach((notify) => {
|
||||
expect(notify.id).to.be.a('number');
|
||||
expect(notify.price).to.be.a('string');
|
||||
expect(notify.size).to.be.a('string');
|
||||
expect(notify.title).to.be.a('string');
|
||||
expect(notify.link).to.be.a('string');
|
||||
expect(notify.address).to.be.a('string');
|
||||
expect(notify.price).that.does.include('€');
|
||||
expect(notify.size).that.does.include('m²');
|
||||
expect(notify.title).to.be.not.empty;
|
||||
expect(notify.link).that.does.include('https://www.immobilienscout24.de');
|
||||
expect(notify.address).to.be.not.empty;
|
||||
});
|
||||
resolve();
|
||||
});*/
|
||||
});
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
@@ -3,7 +3,6 @@ import {get} from '../mocks/mockNotification.js';
|
||||
import {mockFredy, providerConfig} from '../utils.js';
|
||||
import {expect} from 'chai';
|
||||
import * as provider from '../../lib/provider/neubauKompass.js';
|
||||
import * as scrapingAnt from '../../lib/services/scrapingAnt.js';
|
||||
|
||||
describe('#neubauKompass testsuite()', () => {
|
||||
after(() => {
|
||||
@@ -13,13 +12,6 @@ describe('#neubauKompass testsuite()', () => {
|
||||
it('should test neubauKompass provider', async () => {
|
||||
const Fredy = await mockFredy();
|
||||
return await new Promise((resolve) => {
|
||||
if (!scrapingAnt.isScrapingAntApiKeySet()) {
|
||||
/* eslint-disable no-console */
|
||||
console.info('Skipping Neubaukompass test as ScrapingAnt Api Key is not set.');
|
||||
/* eslint-enable no-console */
|
||||
resolve();
|
||||
return;
|
||||
}
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'neubauKompass', similarityCache);
|
||||
fredy.execute().then((listing) => {
|
||||
expect(listing).to.be.a('array');
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
"enabled": true
|
||||
},
|
||||
"immonet": {
|
||||
"url": "https://www.immonet.de/immobiliensuche/beta?pageoffset=1&listsize=100&objecttype=1&locationname=D%C3%BCsseldorf&acid=&actype=&district=8717&district=8718&district=8719&district=8720&district=8721&district=8723&district=8724&district=8725&district=8727&district=8728&district=8729&district=8730&district=8731&district=8732&district=8733&district=8737&district=8738&district=8741&district=8745&district=8747&district=8750&district=8752&district=8754&district=8755&district=8756&district=8759&district=8760&district=8761&district=8763&district=8764&district=8765&ajaxIsRadiusActive=false&sortby=19&suchart=1&radius=0&pcatmtypes=1_1&pCatMTypeStoragefield=&parentcat=1&marketingtype=1&fromprice=&toprice=420000&fromarea=90&toarea=&fromplotarea=&toplotarea=&fromrooms=3&torooms=&objectcat=225&objectcat=18&objectcat=17&objectcat=12&objectcat=16&objectcat=181&objectcat=14&objectcat=15&objectcat=226&objectcat=13&wbs=-1&fromyear=&toyear=",
|
||||
"url": "https://www.immonet.de/classified-search?distributionTypes=Buy,Buy_Auction,Compulsory_Auction&estateTypes=House,Apartment&locations=AD08DE2112&order=Default&m=homepage_new_search_classified_search_result",
|
||||
"enabled": true
|
||||
},
|
||||
"immowelt": {
|
||||
|
||||
@@ -2,13 +2,13 @@ import React from 'react';
|
||||
|
||||
import {useDispatch, useSelector} from 'react-redux';
|
||||
|
||||
import {Divider, Input, Radio, TimePicker, Button, RadioGroup, Checkbox} from '@douyinfe/semi-ui';
|
||||
import {Divider, TimePicker, Button, Checkbox} from '@douyinfe/semi-ui';
|
||||
import {InputNumber} from '@douyinfe/semi-ui';
|
||||
import Headline from '../../components/headline/Headline';
|
||||
import {xhrPost} from '../../services/xhr';
|
||||
import {SegmentPart} from '../../components/segment/SegmentPart';
|
||||
import {Banner, Toast} from '@douyinfe/semi-ui';
|
||||
import {IconSave, IconCalendar, IconKey, IconRefresh, IconSignal, IconLineChartStroked, IconSearch} from '@douyinfe/semi-icons';
|
||||
import {IconSave, IconCalendar, IconRefresh, IconSignal, IconLineChartStroked, IconSearch} from '@douyinfe/semi-icons';
|
||||
import './GeneralSettings.less';
|
||||
|
||||
function formatFromTimestamp(ts) {
|
||||
@@ -35,8 +35,6 @@ const GeneralSettings = function GeneralSettings() {
|
||||
|
||||
const [interval, setInterval] = React.useState('');
|
||||
const [port, setPort] = React.useState('');
|
||||
const [scrapingAntApiKey, setScrapingAntApiKey] = React.useState('');
|
||||
const [scrapingAntProxy, setScrapingAntProxy] = React.useState('');
|
||||
const [workingHourFrom, setWorkingHourFrom] = React.useState(null);
|
||||
const [workingHourTo, setWorkingHourTo] = React.useState(null);
|
||||
const [demoMode, setDemoMode] = React.useState(null);
|
||||
@@ -55,10 +53,8 @@ const GeneralSettings = function GeneralSettings() {
|
||||
async function init() {
|
||||
setInterval(settings?.interval);
|
||||
setPort(settings?.port);
|
||||
setScrapingAntApiKey(settings?.scrapingAnt?.apiKey);
|
||||
setWorkingHourFrom(settings?.workingHours?.from);
|
||||
setWorkingHourTo(settings?.workingHours?.to);
|
||||
setScrapingAntProxy(settings?.scrapingAnt?.proxy || 'datacenter');
|
||||
setAnalyticsEnabled(settings?.analyticsEnabled || false);
|
||||
setDemoMode(settings?.demoMode || false);
|
||||
}
|
||||
@@ -96,10 +92,6 @@ const GeneralSettings = function GeneralSettings() {
|
||||
await xhrPost('/api/admin/generalSettings', {
|
||||
interval,
|
||||
port,
|
||||
scrapingAnt: {
|
||||
apiKey: scrapingAntApiKey,
|
||||
proxy: scrapingAntProxy,
|
||||
},
|
||||
workingHours: {
|
||||
from: workingHourFrom,
|
||||
to: workingHourTo,
|
||||
@@ -155,68 +147,6 @@ const GeneralSettings = function GeneralSettings() {
|
||||
/>
|
||||
</SegmentPart>
|
||||
<Divider margin="1rem"/>
|
||||
<SegmentPart
|
||||
name="ScrapingAnt Api Key"
|
||||
helpText="The api key for ScrapingAnt is used to be able to scrape Immoscout."
|
||||
Icon={IconKey}
|
||||
>
|
||||
<Input
|
||||
type="text"
|
||||
placeholder="ScrapingAnt Api Key"
|
||||
value={scrapingAntApiKey}
|
||||
onChange={(val) => setScrapingAntApiKey(val)}
|
||||
/>
|
||||
</SegmentPart>
|
||||
<Divider margin="1rem"/>
|
||||
<SegmentPart
|
||||
name="ScrapingAnt proxy settings"
|
||||
helpText="Scraping ant provides different proxies."
|
||||
Icon={IconKey}
|
||||
>
|
||||
<Banner
|
||||
fullMode={false}
|
||||
type="info"
|
||||
closeIcon={null}
|
||||
title={
|
||||
<div style={{fontWeight: 600, fontSize: '14px', lineHeight: '20px'}}>
|
||||
ScrapingAnt is needed to scrape Immoscout. ScrapingAnt itself is using 2
|
||||
different types of proxies
|
||||
</div>
|
||||
}
|
||||
style={{marginBottom: '1rem'}}
|
||||
description={
|
||||
<div>
|
||||
<h4>Datacenter-Proxy</h4>
|
||||
Proxy server located in one of the datacenters across the world. Datacenter
|
||||
proxies are slower and
|
||||
more likely to fail, but they are cheaper. A call with a datacenter proxy cost
|
||||
10 credits.
|
||||
<h4>Residential-Proxy</h4>
|
||||
High-quality proxy server located in one of the real people houses across the
|
||||
world. Datacenter
|
||||
proxies are faster and more likely to success, but they are more expensive.
|
||||
<br/>
|
||||
<br/>
|
||||
<b>
|
||||
On the free tier, you have 10.000 credits, so chose your option wisely. Keep
|
||||
in mind, only
|
||||
successful calls will be charged.
|
||||
</b>
|
||||
</div>
|
||||
}
|
||||
/>
|
||||
|
||||
<RadioGroup value={scrapingAntProxy} onChange={(e) => setScrapingAntProxy(e.target.value)}>
|
||||
<Radio name="datacenter" value="datacenter" checked={scrapingAntProxy === 'datacenter'}>
|
||||
Datacenter proxy
|
||||
</Radio>
|
||||
<Radio name="residential" value="residential"
|
||||
checked={scrapingAntProxy === 'residential'}>
|
||||
Residential proxy
|
||||
</Radio>
|
||||
</RadioGroup>
|
||||
</SegmentPart>
|
||||
<Divider margin="1rem"/>
|
||||
<SegmentPart
|
||||
name="Working hours"
|
||||
helpText="During this hours, Fredy will search for new apartments. If nothing is configured, Fredy will search around the clock."
|
||||
|
||||
@@ -1,31 +1,11 @@
|
||||
import React from 'react';
|
||||
import {format} from '../../services/time/timeService';
|
||||
import {Banner, Card, Descriptions, Divider} from '@douyinfe/semi-ui';
|
||||
import {IconBolt} from '@douyinfe/semi-icons';
|
||||
import {Banner, Descriptions} from '@douyinfe/semi-ui';
|
||||
|
||||
export default function ProcessingTimes({processingTimes = {}}) {
|
||||
const {Meta} = Card;
|
||||
if (Object.keys(processingTimes).length === 0) {
|
||||
return null;
|
||||
}
|
||||
if (processingTimes.error != null) {
|
||||
return <Banner
|
||||
fullMode={false}
|
||||
type="danger"
|
||||
closeIcon={null}
|
||||
title={
|
||||
<div style={{fontWeight: 600, fontSize: '14px', lineHeight: '20px'}}>
|
||||
Scraping Ant Error
|
||||
</div>
|
||||
}
|
||||
style={{marginBottom: '1rem'}}
|
||||
description={
|
||||
<div>
|
||||
{processingTimes.error}
|
||||
</div>
|
||||
}
|
||||
/>;
|
||||
}
|
||||
return (
|
||||
<>
|
||||
<Descriptions
|
||||
@@ -47,44 +27,6 @@ export default function ProcessingTimes({processingTimes = {}}) {
|
||||
</>
|
||||
)}
|
||||
</Descriptions>
|
||||
|
||||
{(processingTimes.scrapingAntData != null && Object.keys(processingTimes.scrapingAntData).length > 0) &&(
|
||||
<>
|
||||
<Divider margin="1rem"/>
|
||||
<Card
|
||||
style={{backgroundColor: '#35363c'}}
|
||||
title={
|
||||
<Meta
|
||||
title="Remaining ScrapingAnt calls"
|
||||
description="Information about your Scraping Ant Plan"
|
||||
avatar={<IconBolt/>}
|
||||
/>
|
||||
}
|
||||
>
|
||||
<p>Plan: {processingTimes.scrapingAntData.plan_name}</p>
|
||||
<p>
|
||||
Duration: {format(new Date(processingTimes.scrapingAntData.start_date))} -{' '}
|
||||
{format(new Date(processingTimes.scrapingAntData.end_date))}
|
||||
<br/>
|
||||
Credits: {processingTimes.scrapingAntData.remained_credits}/
|
||||
{processingTimes.scrapingAntData.plan_total_credits}
|
||||
</p>
|
||||
If you want to scrape Immoscout or Immonet more often, you have to purchase a premium account
|
||||
of{' '}
|
||||
<a href="https://scrapingant.com/" target="_blank" rel="noreferrer">
|
||||
ScrapingAnt
|
||||
</a>
|
||||
. You can use the code <b>FREDY10</b> to get 10% off. (No affiliation, we are <b>not</b> getting
|
||||
paid by ScrapingAnt.)
|
||||
</Card>
|
||||
</>
|
||||
)}
|
||||
</>
|
||||
);
|
||||
}
|
||||
|
||||
/*
|
||||
|
||||
|
||||
|
||||
*/
|
||||
|
||||
@@ -96,17 +96,15 @@ export default function ProviderMutator({ onVisibilityChanged, visible = false,
|
||||
fullMode={false}
|
||||
type="warning"
|
||||
closeIcon={null}
|
||||
title={<div style={{ fontWeight: 600, fontSize: '14px', lineHeight: '20px' }}>ScrapingAnt</div>}
|
||||
title={<div style={{ fontWeight: 600, fontSize: '14px', lineHeight: '20px' }}>Warning</div>}
|
||||
style={{ marginBottom: '1rem' }}
|
||||
description={
|
||||
<div>
|
||||
<p>
|
||||
If you chose Immoscout, Immonet or NeubauKompass as a provider, make sure to also add the scrapingAnt apiKey to the config.json.
|
||||
(See readme)
|
||||
Immoscout will not work at the moment due to advanced bot detection. I'm currently working on a fix.
|
||||
</p>
|
||||
<p>
|
||||
Do not forget to sort the results by date before copying the url to Fredy, so that Fredy always captures
|
||||
the latest search results.
|
||||
Until a fix has been released, Immoscout won't yield any results.
|
||||
</p>
|
||||
</div>
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user