mirror of
https://github.com/orangecoding/fredy.git
synced 2026-06-16 12:31:07 +00:00
Compare commits
10 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a3aa512db3 | ||
|
|
8361d9c8ff | ||
|
|
ad7415f4f5 | ||
|
|
c97b323b35 | ||
|
|
ec986e4b18 | ||
|
|
8d93581dfc | ||
|
|
b65c5d1a0c | ||
|
|
57d295e882 | ||
|
|
59e6d287fc | ||
|
|
88c046dbd4 |
@@ -4,6 +4,7 @@ module.exports = {
|
||||
es6: true,
|
||||
node: true,
|
||||
browser: true,
|
||||
mocha: true,
|
||||
},
|
||||
parser: 'babel-eslint',
|
||||
extends: ['eslint:recommended', 'prettier'],
|
||||
@@ -11,6 +12,7 @@ module.exports = {
|
||||
globals: {
|
||||
Promise: false,
|
||||
describe: true,
|
||||
after: true,
|
||||
it: true,
|
||||
fetch: true,
|
||||
},
|
||||
|
||||
12
CHANGELOG.md
12
CHANGELOG.md
@@ -1,8 +1,18 @@
|
||||
###### [V5.3.0]
|
||||
- Upgrading dependencies
|
||||
- It's now possible to send mails to multiple receiver using comma separation for MailJet & Sendgrid
|
||||
- Fixing Immowelt scraping
|
||||
|
||||
###### [V5.2.0]
|
||||
- Upgrading dependencies
|
||||
- Adding new similarity check layer (Duplicates are being removed now)
|
||||
- Adding paging for search results
|
||||
|
||||
###### [V5.1.0]
|
||||
- Upgrading dependencies
|
||||
- NodeJS 12.13 is now the minimum supported version
|
||||
- Adding general settings as new configuration page to ui
|
||||
- Adding new feature working hours
|
||||
- Adding new feature working hours
|
||||
|
||||
###### [V5.0.0]
|
||||
- Upgrading dependencies
|
||||
|
||||
10
README.md
10
README.md
@@ -2,9 +2,11 @@
|
||||
|
||||
[](https://travis-ci.org/orangecoding/fredy)
|
||||
|
||||
_Fredy_ scrapes multiple services (Immonet, Immowelt etc.) as often as you want and send new listings to you once they appear. The list of available services can easily be extended. For your convenience, a ui helps you to configure your search jobs.
|
||||
Searching an apartment in Germany can be quite frustrating. Not any longer as Fredy will take over and only notifies you once new listings have been found that matches your requirements.
|
||||
|
||||
If _Fredy_ found matching results, it will send them to you via Slack, Email, Telegram etc. (More adapter possible.) As _Fredy_ will store the listings it found, new results will not be sent twice (and as a side-effect, _Fredy_ can show some statistics..)
|
||||
_Fredy_ scrapes multiple services (Immonet, Immowelt etc.) and send new listings to you once they appear. The list of available services can easily be extended. For your convenience, a ui helps you to configure your search jobs.
|
||||
|
||||
If _Fredy_ found matching results, it will send them to you via Slack, Email, Telegram etc. (More adapter possible.) As _Fredy_ will store the listings it has found, new results will not be sent twice (and as a side-effect, _Fredy_ can show some statistics..). Furthermore, _Fredy_ checks duplicates per scraping so that the same listings are not being sent when posted on various platforms. (Happens more often than one might think)
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -15,7 +17,7 @@ yarn (or npm install)
|
||||
yarn run prod
|
||||
yarn run start
|
||||
```
|
||||
_Fredy_ will start with the default port, set to `9998`. You can access _Fredy_ by opening a browser `http://localhost:9998`. The default login is `admin` for username and password. (You should change the password asap when you plan to run Fredy on your server.)
|
||||
_Fredy_ will start with the default port, set to `9998`. You can access _Fredy_ by opening a browser `http://localhost:9998`. The default login is `admin` both for username and password. (You should change the password asap when you plan to run Fredy on your server.)
|
||||
|
||||
<p align="center">
|
||||
<img alt="Job Configuration" src="https://github.com/orangecoding/fredy/blob/master/doc/screenshot__1.png" width="30%">
|
||||
@@ -29,7 +31,7 @@ _Fredy_ will start with the default port, set to `9998`. You can access _Fredy_
|
||||
</p>
|
||||
|
||||
## Immoscout
|
||||
I have added **EXPERIMENTAL** support for Immoscout. Immoscout is somewhat special, coz they have decided to secure their service from bots using Re-Capture. Finding a way around this is barely possible. For _Fredy_ to be able to bypass the check, I'm using a service called [ScrapingAnt](https://scrapingant.com/). The trick is to use a headless browser, rotating proxies and (once successful validated) re-send the cookies each time.
|
||||
I have added **experimental** support for Immoscout. Immoscout is somewhat special, coz they have decided to secure their service from bots using Re-Capture. Finding a way around this is barely possible. For _Fredy_ to be able to bypass the check, I'm using a service called [ScrapingAnt](https://scrapingant.com/). The trick is to use a headless browser, rotating proxies and (once successful validated) re-send the cookies each time.
|
||||
|
||||
To be able to use Immoscout, you need to create an account at ScrapingAnt. Configure the ApiKey in the "General Settings" tab (visible when logged in as administrator).
|
||||
The rest should be done by _Fredy_. Keep in mind, the support is experimental. There might be bugs and you might not always get pass the re-capture check, but most of the time it works pretty good :)
|
||||
|
||||
@@ -1 +1 @@
|
||||
{"interval":"30","port":9998,"scrapingAnt":{"apiKey":""},"workingHours":{"from":"","to":""}}
|
||||
{"interval":"60","port":9998,"scrapingAnt":{"apiKey":""},"workingHours":{"from":"","to":""}}
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 134 KiB After Width: | Height: | Size: 189 KiB |
11
index.js
11
index.js
@@ -9,8 +9,9 @@ const path = './lib/provider';
|
||||
const provider = fs.readdirSync(path).filter((file) => file.endsWith('.js'));
|
||||
const config = require('./conf/config.json');
|
||||
|
||||
const jobStorage = require('./lib/services/storage/jobStorage');
|
||||
const similarityCache = require('./lib/services/similarity-check/similarityCache');
|
||||
const { setLastJobExecution } = require('./lib/services/storage/listingsStorage');
|
||||
const jobStorage = require('./lib/services/storage/jobStorage');
|
||||
const FredyRuntime = require('./lib/FredyRuntime');
|
||||
|
||||
const { duringWorkingHoursOrNotSet } = require('./lib/utils');
|
||||
@@ -50,7 +51,13 @@ setInterval(
|
||||
throw new Error(`Provider Config for provider with id ${providerId} not found.`);
|
||||
}
|
||||
pro.init(providerConfig, job.blacklist);
|
||||
await new FredyRuntime(pro.config, job.notificationAdapter, providerId, job.id).execute();
|
||||
await new FredyRuntime(
|
||||
pro.config,
|
||||
job.notificationAdapter,
|
||||
providerId,
|
||||
job.id,
|
||||
similarityCache
|
||||
).execute();
|
||||
setLastJobExecution(job.id);
|
||||
});
|
||||
});
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
const { NoNewListingsError } = require('./errors');
|
||||
const { NoNewListingsWarning } = require('./errors');
|
||||
const { setKnownListings, getKnownListings } = require('./services/storage/listingsStorage');
|
||||
|
||||
const notify = require('./notification/notify');
|
||||
@@ -12,12 +12,14 @@ class FredyRuntime {
|
||||
* @param notificationConfig the config for all notifications
|
||||
* @param providerId the id of the provider currently in use
|
||||
* @param jobKey key of the job that is currently running (from within the config)
|
||||
* @param similarityCache cache instance holding values to check for similarity of entries
|
||||
*/
|
||||
constructor(providerConfig, notificationConfig, providerId, jobKey) {
|
||||
constructor(providerConfig, notificationConfig, providerId, jobKey, similarityCache) {
|
||||
this._providerConfig = providerConfig;
|
||||
this._notificationConfig = notificationConfig;
|
||||
this._providerId = providerId;
|
||||
this._jobKey = jobKey;
|
||||
this._similarityCache = similarityCache;
|
||||
}
|
||||
|
||||
execute() {
|
||||
@@ -33,6 +35,8 @@ class FredyRuntime {
|
||||
.then(this._findNew.bind(this))
|
||||
//store everything in db
|
||||
.then(this._save.bind(this))
|
||||
//check for similar listings. if found, remove them before notifying
|
||||
.then(this._filterBySimilarListings.bind(this))
|
||||
//notify the user using the configured notification adapter
|
||||
.then(this._notify.bind(this))
|
||||
//if an error occurred on the way, handle it here.
|
||||
@@ -53,14 +57,29 @@ class FredyRuntime {
|
||||
}
|
||||
const u = scrapingAnt.isImmoscout(id) ? scrapingAnt.transformUrlForScrapingAnt(url, id) : url;
|
||||
try {
|
||||
xray(u, this._providerConfig.crawlContainer, [this._providerConfig.crawlFields])
|
||||
.then((listings) => {
|
||||
resolve(listings == null ? [] : listings);
|
||||
})
|
||||
.catch((err) => {
|
||||
reject(err);
|
||||
console.error(err);
|
||||
});
|
||||
if (this._providerConfig.paginate != null) {
|
||||
xray(u, this._providerConfig.crawlContainer, [this._providerConfig.crawlFields])
|
||||
//the first 2 pages should be enough here
|
||||
//TODO: Think about automagically sort by date
|
||||
.limit(2)
|
||||
.paginate(this._providerConfig.paginate)
|
||||
.then((listings) => {
|
||||
resolve(listings == null ? [] : listings);
|
||||
})
|
||||
.catch((err) => {
|
||||
reject(err);
|
||||
console.error(err);
|
||||
});
|
||||
} else {
|
||||
xray(u, this._providerConfig.crawlContainer, [this._providerConfig.crawlFields])
|
||||
.then((listings) => {
|
||||
resolve(listings == null ? [] : listings);
|
||||
})
|
||||
.catch((err) => {
|
||||
reject(err);
|
||||
console.error(err);
|
||||
});
|
||||
}
|
||||
} catch (error) {
|
||||
reject(error);
|
||||
console.error(error);
|
||||
@@ -80,13 +99,16 @@ class FredyRuntime {
|
||||
const newListings = listings.filter((o) => getKnownListings(this._jobKey, this._providerId)[o.id] == null);
|
||||
|
||||
if (newListings.length === 0) {
|
||||
throw new NoNewListingsError();
|
||||
throw new NoNewListingsWarning();
|
||||
}
|
||||
|
||||
return newListings;
|
||||
}
|
||||
|
||||
_notify(newListings) {
|
||||
if (newListings.length === 0) {
|
||||
throw new NoNewListingsWarning();
|
||||
}
|
||||
const sendNotifications = notify.send(this._providerId, newListings, this._notificationConfig, this._jobKey);
|
||||
return Promise.all(sendNotifications).then(() => newListings);
|
||||
}
|
||||
@@ -100,8 +122,22 @@ class FredyRuntime {
|
||||
return newListings;
|
||||
}
|
||||
|
||||
_filterBySimilarListings(listings) {
|
||||
const filteredList = listings.filter((listing) => {
|
||||
const similar = this._similarityCache.hasSimilarEntries(this._jobKey, listing.title);
|
||||
if (similar) {
|
||||
/* eslint-disable no-console */
|
||||
console.debug(`Filtering similar entry for job with id ${this._jobKey} with title: `, listing.title);
|
||||
/* eslint-enable no-console */
|
||||
}
|
||||
return !similar;
|
||||
});
|
||||
filteredList.forEach((filter) => this._similarityCache.addCacheEntry(this._jobKey, filter.title));
|
||||
return filteredList;
|
||||
}
|
||||
|
||||
_handleError(err) {
|
||||
if (err.name !== 'NoNewListingsError') console.error(err);
|
||||
if (err.name !== 'NoNewListingsWarning') console.error(err);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -10,6 +10,6 @@ class ExtendableError extends Error {
|
||||
}
|
||||
}
|
||||
|
||||
class NoNewListingsError extends ExtendableError {}
|
||||
class NoNewListingsWarning extends ExtendableError {}
|
||||
|
||||
module.exports = { NoNewListingsError };
|
||||
module.exports = { NoNewListingsWarning };
|
||||
|
||||
@@ -21,6 +21,13 @@ exports.send = ({ serviceName, newListings, notificationConfig, jobKey }) => {
|
||||
(adapter) => adapter.id === 'mailJet'
|
||||
).fields;
|
||||
|
||||
const to = receiver
|
||||
.trim()
|
||||
.split(',')
|
||||
.map((r) => ({
|
||||
Email: r.trim(),
|
||||
}));
|
||||
|
||||
return mailjet
|
||||
.connect(apiPublicKey, apiPrivateKey)
|
||||
.post('send', { version: 'v3.1' })
|
||||
@@ -31,11 +38,7 @@ exports.send = ({ serviceName, newListings, notificationConfig, jobKey }) => {
|
||||
Email: from,
|
||||
Name: 'Fredy',
|
||||
},
|
||||
To: [
|
||||
{
|
||||
Email: receiver,
|
||||
},
|
||||
],
|
||||
To: to,
|
||||
Subject: `Fredy found ${newListings.length} new listings for ${serviceName}`,
|
||||
HTMLPart: emailTemplate({
|
||||
serviceName: `Job: (${jobKey}) | Service: ${serviceName}`,
|
||||
|
||||
@@ -4,3 +4,5 @@ To use [MailJet](https://mailjet.com), you need to create an account. You'll nee
|
||||
|
||||
E.g. if you use yourGmailAccount@gmail.com, you have to add this to MailJet and verify it as well.
|
||||
The given public/private api keys are needed in order to use MailJet with Fredy. Fredy will use the same template, it is using for SendGrid.
|
||||
|
||||
If this email should be sent to multiple receiver use a comma separator (some@email.com, someOther@email.com).
|
||||
|
||||
@@ -14,7 +14,10 @@ exports.send = ({ serviceName, newListings, notificationConfig, jobKey }) => {
|
||||
sgMail.setApiKey(apiKey);
|
||||
const msg = {
|
||||
templateId,
|
||||
to: receiver,
|
||||
to: receiver
|
||||
.trim()
|
||||
.split(',')
|
||||
.map((r) => r.trim()),
|
||||
from,
|
||||
subject: `Job ${jobKey} | Service ${serviceName} found ${newListings.length} new listing(s)`,
|
||||
dynamic_template_data: {
|
||||
|
||||
@@ -6,3 +6,5 @@ SendGrid is a free email service (free as in "you cannot send more than 100(Send
|
||||
To use [SendGrid](https://sendgrid.com/), you need to create an account. You'll need to decided from which email address you want Fredy to send from. E.g. if you use yourGmailAccount@gmail.com, you have to add this to sendgrid and verify it as well.
|
||||
|
||||
Lastly you have to create an api-key and feed it into Fredy's config, as well as creating a new dynamic template. For this new template, I recommend copying and pasting the code from the one I have provided under `/lib/notification/emailTemplate/template.hbs`.
|
||||
|
||||
If this email should be sent to multiple receiver use a comma separator (some@email.com, someOther@email.com).
|
||||
|
||||
@@ -1,6 +1,19 @@
|
||||
const { markdown2Html } = require('../../services/markdown');
|
||||
const axios = require('axios');
|
||||
|
||||
/**
|
||||
* splitting an array into chunks because Telegram only allows for messages up to
|
||||
* 4096 chars, thus we have to split messages into chunks
|
||||
* @param inputArray
|
||||
* @param perChunk
|
||||
*/
|
||||
const arrayChunks = (inputArray, perChunk) =>
|
||||
inputArray.reduce((all, one, i) => {
|
||||
const ch = Math.floor(i / perChunk);
|
||||
all[ch] = [].concat(all[ch] || [], one);
|
||||
return all;
|
||||
}, []);
|
||||
|
||||
/**
|
||||
* sends new listings to telegram
|
||||
* @param serviceName e.g immowelt
|
||||
@@ -12,22 +25,28 @@ const axios = require('axios');
|
||||
exports.send = ({ serviceName, newListings, notificationConfig, jobKey }) => {
|
||||
const { token, chatId } = notificationConfig.find((adapter) => adapter.id === 'telegram').fields;
|
||||
|
||||
let message = `Job: ${jobKey} | Service <b>${serviceName}</b> found <b>${newListings.length}</b> new listings:\n\n`;
|
||||
//we have to split messages into chunk, because otherwise messages are going to become too big and will fail
|
||||
const chunks = arrayChunks(newListings, 3);
|
||||
|
||||
message += newListings.map(
|
||||
(o) =>
|
||||
`<b>${shorten(o.title.replace(/\*/g, ''), 45)}</b>\n` +
|
||||
[o.address, o.price, o.size].join(' | ') +
|
||||
'\n' +
|
||||
`<a href="${o.link}">${o.link}</a>\n\n`
|
||||
);
|
||||
const promises = chunks.map((chunk) => {
|
||||
let message = `Job: ${jobKey} | Service <b>${serviceName}</b> found <b>${newListings.length}</b> new listings:\n\n`;
|
||||
message += chunk.map(
|
||||
(o) =>
|
||||
`<b>${shorten(o.title.replace(/\*/g, ''), 45)}</b>\n` +
|
||||
[o.address, o.price, o.size].join(' | ') +
|
||||
'\n' +
|
||||
`<a href="${o.link}">${o.link}</a>\n\n`
|
||||
);
|
||||
|
||||
return axios.post(`https://api.telegram.org/bot${token}/sendMessage`, {
|
||||
chat_id: chatId,
|
||||
text: message,
|
||||
parse_mode: 'HTML',
|
||||
disable_web_page_preview: true,
|
||||
return axios.post(`https://api.telegram.org/bot${token}/sendMessage`, {
|
||||
chat_id: chatId,
|
||||
text: message,
|
||||
parse_mode: 'HTML',
|
||||
disable_web_page_preview: true,
|
||||
});
|
||||
});
|
||||
|
||||
return Promise.all(promises);
|
||||
};
|
||||
|
||||
function shorten(str, len = 30) {
|
||||
|
||||
@@ -30,7 +30,6 @@ const config = {
|
||||
title: '.tabelle .inner_object_data .tabelle_inhalt_titel_black | removeNewline | trim',
|
||||
description: '.tabelle .inner_object_data .objekt_beschreibung | removeNewline | trim',
|
||||
},
|
||||
paginate: '.pagination_blocks div:last a@href',
|
||||
normalize: normalize,
|
||||
filter: applyBlacklist,
|
||||
};
|
||||
|
||||
@@ -2,9 +2,13 @@ const utils = require('../utils');
|
||||
|
||||
let appliedBlackList = [];
|
||||
|
||||
function nullOrEmpty(val) {
|
||||
return val == null || val.length === 0;
|
||||
}
|
||||
|
||||
function normalize(o) {
|
||||
const title = o.title.replace('NEU', '');
|
||||
const address = (o.address || '').replace(/\(.*\),.*$/, '').trim();
|
||||
const title = nullOrEmpty(o.title) ? 'NO TITLE FOUND' : o.title.replace('NEU', '');
|
||||
const address = nullOrEmpty(o.address) ? 'NO ADDRESS FOUND' : (o.address || '').replace(/\(.*\),.*$/, '').trim();
|
||||
const link = `https://www.immobilienscout24.de${o.link.substring(o.link.indexOf('/expose'))}`;
|
||||
return Object.assign(o, { title, address, link });
|
||||
}
|
||||
|
||||
@@ -3,10 +3,7 @@ const utils = require('../utils');
|
||||
let appliedBlackList = [];
|
||||
|
||||
function normalize(o) {
|
||||
const size = o.size == null ? '--- m²' : o.size.split('Wohnfläche')[1].replace(' (ca.) ', '');
|
||||
const address = o.address;
|
||||
|
||||
return Object.assign(o, { size, address });
|
||||
return o;
|
||||
}
|
||||
|
||||
function applyBlacklist(o) {
|
||||
@@ -18,14 +15,14 @@ function applyBlacklist(o) {
|
||||
|
||||
const config = {
|
||||
url: null,
|
||||
crawlContainer: '.immoliste .js-object.listitem_wrap ',
|
||||
crawlContainer: "div[class^='EstateItem-']",
|
||||
crawlFields: {
|
||||
id: '@data-estateid | int',
|
||||
price: '.hardfacts_3 strong | removeNewline | trim',
|
||||
size: '.js-object.listitem_wrap .hardfacts_3 div:nth-child(2)| removeNewline | trim',
|
||||
title: '.listcontent.clear h2',
|
||||
id: 'a@id',
|
||||
price: "div[class^='KeyFacts-'] [data-test='price'] | removeNewline | trim",
|
||||
size: "div[class^='KeyFacts-'] [data-test='area'] | removeNewline | trim",
|
||||
title: "div[class^='FactsMain-'] h2",
|
||||
link: 'a@href',
|
||||
address: '.listcontent .details .listlocation| removeNewline | trim',
|
||||
address: "div[class^='estateFacts-'] span | removeNewline | trim",
|
||||
},
|
||||
paginate: '#pnlPaging #nlbPlus@href',
|
||||
normalize: normalize,
|
||||
|
||||
@@ -20,7 +20,7 @@ function applyBlacklist(o) {
|
||||
|
||||
const config = {
|
||||
url: null,
|
||||
crawlContainer: '#srchrslt-adtable .ad-listitem',
|
||||
crawlContainer: '#srchrslt-adtable .ad-listitem ',
|
||||
crawlFields: {
|
||||
id: '.aditem@data-adid | int',
|
||||
price: '.aditem-main--middle--price | removeNewline | trim',
|
||||
|
||||
@@ -24,7 +24,6 @@ const config = {
|
||||
title: '.truncate_title a |removeNewline |trim',
|
||||
link: '.truncate_title a@href',
|
||||
},
|
||||
paginate: '.pagination-sm:first a:last@href',
|
||||
normalize: normalize,
|
||||
filter: applyBlacklist,
|
||||
};
|
||||
|
||||
@@ -1,4 +1,7 @@
|
||||
const axios = require('axios');
|
||||
const axiosRetry = require('axios-retry');
|
||||
|
||||
axiosRetry(axios, { retryDelay: axiosRetry.exponentialDelay, retries: 3 });
|
||||
|
||||
function makeDriver(headers = {}) {
|
||||
let cookies = '';
|
||||
@@ -15,7 +18,8 @@ function makeDriver(headers = {}) {
|
||||
},
|
||||
});
|
||||
} catch (exception) {
|
||||
callback(exception, null);
|
||||
console.error(`Error while trying to scrape data. Received error: ${exception.message}`);
|
||||
callback(null, []);
|
||||
}
|
||||
|
||||
if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
|
||||
|
||||
36
lib/services/similarity-check/SimilarityCacheEntry.js
Normal file
36
lib/services/similarity-check/SimilarityCacheEntry.js
Normal file
@@ -0,0 +1,36 @@
|
||||
const stringSimilarity = require('string-similarity');
|
||||
|
||||
//if the score is higher than this, it will be considered a match
|
||||
const MAX_DICE_INDEX = 0.7;
|
||||
|
||||
/**
|
||||
* The similarity check is based on the dice coefficient. => https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
|
||||
*
|
||||
* @type {module.SimilarityCacheEntry}
|
||||
*/
|
||||
module.exports = class SimilarityCacheEntry {
|
||||
constructor(time) {
|
||||
this.time = time;
|
||||
this.values = [];
|
||||
}
|
||||
|
||||
setCacheEntry = (entry) => {
|
||||
this.values.push(entry);
|
||||
};
|
||||
|
||||
getTime = () => {
|
||||
return this.time;
|
||||
};
|
||||
|
||||
hasSimilarEntries = (value) => {
|
||||
if (this.values.length > 0) {
|
||||
for (let i = 0; i < this.values.length; i++) {
|
||||
const index = stringSimilarity.compareTwoStrings(value, this.values[i]);
|
||||
if (index >= MAX_DICE_INDEX) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
return false;
|
||||
};
|
||||
};
|
||||
63
lib/services/similarity-check/similarityCache.js
Normal file
63
lib/services/similarity-check/similarityCache.js
Normal file
@@ -0,0 +1,63 @@
|
||||
/**
|
||||
* each job that runs scrapes all provider. This cache holds the titles of the found listing(s) and provides
|
||||
* a similarity check. if this check returns true, it will not be forwarded to the notification adapter, thus
|
||||
* the user won't see any duplicates
|
||||
*
|
||||
* The retention of this cache is per default 5 minutes, but can be smaller if the interval is > 5 mins.
|
||||
*
|
||||
* @type {module.SimilarityCacheEntry|{}}
|
||||
*/
|
||||
const SimilarityCacheEntry = require('./SimilarityCacheEntry');
|
||||
const config = require('../../../conf/config.json');
|
||||
|
||||
//5 minutes
|
||||
let retention = 5 * 60 * 1000;
|
||||
|
||||
const intervalInMs = config.interval * 60 * 1000;
|
||||
//an interval below 5 mins sounds crazy, but there are ppl out there doing crazy shit.
|
||||
if (intervalInMs <= retention) {
|
||||
retention = Math.floor(intervalInMs / 2);
|
||||
}
|
||||
|
||||
//jobid -> SimilarityCacheEntry
|
||||
const cache = {};
|
||||
|
||||
let intervalId;
|
||||
|
||||
exports.addCacheEntry = (jobId, value) => {
|
||||
cache[jobId] = cache[jobId] || new SimilarityCacheEntry(Date.now());
|
||||
cache[jobId].setCacheEntry(value);
|
||||
};
|
||||
|
||||
exports.hasSimilarEntries = (jobId, value) => {
|
||||
if (cache[jobId] == null) {
|
||||
return false;
|
||||
}
|
||||
|
||||
return cache[jobId].hasSimilarEntries(value);
|
||||
};
|
||||
|
||||
/**
|
||||
* cleanup
|
||||
*/
|
||||
intervalId = setInterval(() => {
|
||||
const keysToBeRemoved = [];
|
||||
const now = Date.now();
|
||||
|
||||
Object.keys(cache).forEach((key) => {
|
||||
if (cache[key].getTime() + retention < now) {
|
||||
keysToBeRemoved.push(key);
|
||||
}
|
||||
});
|
||||
|
||||
if (keysToBeRemoved.length > 0) {
|
||||
keysToBeRemoved.forEach((key) => delete cache[key]);
|
||||
}
|
||||
}, 10000);
|
||||
|
||||
/**
|
||||
* mostly used for tests
|
||||
*/
|
||||
exports.stopCacheCleanup = () => {
|
||||
clearInterval(intervalId);
|
||||
};
|
||||
@@ -61,12 +61,18 @@ exports.setJobStatus = ({ jobId, status }) => {
|
||||
};
|
||||
|
||||
exports.removeJob = (jobId) => {
|
||||
listingStorage.removeListings(jobId);
|
||||
db.get('jobs')
|
||||
.remove((job) => job.id === jobId)
|
||||
.write();
|
||||
};
|
||||
|
||||
exports.removeJobsByUserId = (userId) => {
|
||||
db.get('jobs')
|
||||
.value()
|
||||
.filter((job) => job.userId === userId)
|
||||
.forEach((job) => listingStorage.removeListings(job.id));
|
||||
|
||||
db.get('jobs')
|
||||
.remove((job) => job.userId === userId)
|
||||
.write();
|
||||
|
||||
@@ -47,3 +47,7 @@ exports.setLastJobExecution = (jobId) => {
|
||||
const key = buildKey(jobId, null, 'lastExecution');
|
||||
return db.set(key, Date.now()).write();
|
||||
};
|
||||
|
||||
exports.removeListings = (jobId) => {
|
||||
db.unset(jobId).write();
|
||||
};
|
||||
|
||||
59
package.json
59
package.json
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "fredy",
|
||||
"version": "5.1.0",
|
||||
"version": "5.3.2",
|
||||
"description": "[F]ind [R]eal [E]states [d]amn eas[y].",
|
||||
"scripts": {
|
||||
"start": "node index.js",
|
||||
@@ -32,6 +32,7 @@
|
||||
"house",
|
||||
"rent",
|
||||
"immoscout",
|
||||
"scraper",
|
||||
"immonet",
|
||||
"immowelt",
|
||||
"immobilienscout24"
|
||||
@@ -51,61 +52,63 @@
|
||||
"Firefox ESR"
|
||||
],
|
||||
"dependencies": {
|
||||
"@rematch/core": "2.0.1",
|
||||
"@rematch/loading": "2.0.1",
|
||||
"@sendgrid/mail": "7.4.4",
|
||||
"axios": "0.21.1",
|
||||
"@rematch/core": "2.1.0",
|
||||
"@rematch/loading": "2.1.0",
|
||||
"@sendgrid/mail": "7.4.7",
|
||||
"axios": "0.24.0",
|
||||
"axios-retry": "^3.2.4",
|
||||
"body-parser": "1.19.0",
|
||||
"cookie-session": "1.4.0",
|
||||
"handlebars": "4.7.7",
|
||||
"highcharts": "9.1.0",
|
||||
"highcharts": "9.2.2",
|
||||
"highcharts-react-official": "3.0.0",
|
||||
"lowdb": "1.0.0",
|
||||
"markdown": "^0.5.0",
|
||||
"nanoid": "3.1.23",
|
||||
"nanoid": "3.1.28",
|
||||
"node-mailjet": "3.3.4",
|
||||
"react": "17.0.2",
|
||||
"react-dom": "17.0.2",
|
||||
"react-redux": "7.2.4",
|
||||
"react-router": "5.2.0",
|
||||
"react-router-dom": "5.2.0",
|
||||
"react-redux": "7.2.5",
|
||||
"react-router": "5.2.1",
|
||||
"react-router-dom": "5.3.0",
|
||||
"react-switch": "^6.0.0",
|
||||
"redux": "4.1.0",
|
||||
"redux": "4.1.1",
|
||||
"redux-thunk": "2.3.0",
|
||||
"restana": "4.9.1",
|
||||
"semantic-ui-react": "2.0.3",
|
||||
"semantic-ui-react": "2.0.4",
|
||||
"serve-static": "^1.14.1",
|
||||
"slack": "11.0.2",
|
||||
"string-similarity": "^4.0.4",
|
||||
"x-ray": "2.3.4"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@babel/core": "7.14.3",
|
||||
"@babel/preset-env": "7.14.2",
|
||||
"@babel/preset-react": "7.13.13",
|
||||
"@babel/core": "7.15.5",
|
||||
"@babel/preset-env": "7.15.6",
|
||||
"@babel/preset-react": "7.14.5",
|
||||
"babel-eslint": "10.1.0",
|
||||
"babel-loader": "8.2.2",
|
||||
"chai": "4.3.4",
|
||||
"clean-webpack-plugin": "3.0.0",
|
||||
"copy-webpack-plugin": "9.0.0",
|
||||
"css-loader": "5.2.6",
|
||||
"eslint": "7.27.0",
|
||||
"clean-webpack-plugin": "4.0.0",
|
||||
"copy-webpack-plugin": "9.0.1",
|
||||
"css-loader": "6.3.0",
|
||||
"eslint": "7.32.0",
|
||||
"eslint-config-prettier": "8.3.0",
|
||||
"eslint-plugin-react": "7.23.2",
|
||||
"eslint-plugin-react": "7.26.1",
|
||||
"file-loader": "6.2.0",
|
||||
"history": "5.0.0",
|
||||
"history": "5.0.1",
|
||||
"husky": "4.3.8",
|
||||
"less": "4.1.1",
|
||||
"less-loader": "9.0.0",
|
||||
"lint-staged": "11.0.0",
|
||||
"mocha": "8.4.0",
|
||||
"prettier": "2.3.0",
|
||||
"less-loader": "10.0.1",
|
||||
"lint-staged": "11.1.2",
|
||||
"mocha": "9.1.2",
|
||||
"prettier": "2.4.1",
|
||||
"proxyquire": "2.1.3",
|
||||
"redux-logger": "3.0.6",
|
||||
"style-loader": "2.0.0",
|
||||
"style-loader": "3.3.0",
|
||||
"url-loader": "4.1.1",
|
||||
"webpack": "5.37.1",
|
||||
"webpack": "5.56.0",
|
||||
"webpack-cli": "3.3.12",
|
||||
"webpack-dev-server": "3.11.2",
|
||||
"webpack-merge": "5.7.3"
|
||||
"webpack-merge": "5.8.0"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
const similarityCache = require('../../lib/services/similarity-check/similarityCache');
|
||||
const mockNotification = require('../mocks/mockNotification');
|
||||
const providerConfig = require('./testProvider.json');
|
||||
const mockStore = require('../mocks/mockStore');
|
||||
@@ -6,6 +7,10 @@ const expect = require('chai').expect;
|
||||
const provider = require('../../lib/provider/einsAImmobilien');
|
||||
|
||||
describe('#einsAImmobilien testsuite()', () => {
|
||||
after(() => {
|
||||
similarityCache.stopCacheCleanup();
|
||||
});
|
||||
|
||||
provider.init(providerConfig.einsAImmobilien, [], []);
|
||||
|
||||
const Fredy = proxyquire('../../lib/FredyRuntime', {
|
||||
@@ -17,7 +22,7 @@ describe('#einsAImmobilien testsuite()', () => {
|
||||
|
||||
it('should test einsAImmobilien provider', async () => {
|
||||
return await new Promise((resolve) => {
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1');
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1', similarityCache);
|
||||
fredy.execute().then((listings) => {
|
||||
expect(listings).to.be.a('array');
|
||||
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
const similarityCache = require('../../lib/services/similarity-check/similarityCache');
|
||||
const mockNotification = require('../mocks/mockNotification');
|
||||
const providerConfig = require('./testProvider.json');
|
||||
const mockStore = require('../mocks/mockStore');
|
||||
@@ -6,6 +7,10 @@ const expect = require('chai').expect;
|
||||
const provider = require('../../lib/provider/immonet');
|
||||
|
||||
describe('#immonet testsuite()', () => {
|
||||
after(() => {
|
||||
similarityCache.stopCacheCleanup();
|
||||
});
|
||||
|
||||
provider.init(providerConfig.immonet, [], []);
|
||||
const Fredy = proxyquire('../../lib/FredyRuntime', {
|
||||
'./services/storage/listingsStorage': {
|
||||
@@ -16,7 +21,7 @@ describe('#immonet testsuite()', () => {
|
||||
|
||||
it('should test immonet provider', async () => {
|
||||
return await new Promise((resolve) => {
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1');
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1', similarityCache);
|
||||
fredy.execute().then((listing) => {
|
||||
expect(listing).to.be.a('array');
|
||||
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
const similarityCache = require('../../lib/services/similarity-check/similarityCache');
|
||||
const mockNotification = require('../mocks/mockNotification');
|
||||
const providerConfig = require('./testProvider.json');
|
||||
const mockStore = require('../mocks/mockStore');
|
||||
@@ -7,6 +8,9 @@ const provider = require('../../lib/provider/immoscout');
|
||||
const scrapingAnt = require('../../lib/services/scrapingAnt');
|
||||
|
||||
describe('#immoscout testsuite()', () => {
|
||||
after(() => {
|
||||
similarityCache.stopCacheCleanup();
|
||||
});
|
||||
provider.init(providerConfig.immoscout, [], []);
|
||||
const Fredy = proxyquire('../../lib/FredyRuntime', {
|
||||
'./services/storage/listingsStorage': {
|
||||
@@ -25,7 +29,7 @@ describe('#immoscout testsuite()', () => {
|
||||
return;
|
||||
}
|
||||
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1');
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1', similarityCache);
|
||||
fredy.execute().then((listing) => {
|
||||
expect(listing).to.be.a('array');
|
||||
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
const similarityCache = require('../../lib/services/similarity-check/similarityCache');
|
||||
const mockNotification = require('../mocks/mockNotification');
|
||||
const providerConfig = require('./testProvider.json');
|
||||
const mockStore = require('../mocks/mockStore');
|
||||
@@ -6,6 +7,9 @@ const expect = require('chai').expect;
|
||||
const provider = require('../../lib/provider/immowelt');
|
||||
|
||||
describe('#immowelt testsuite()', () => {
|
||||
after(() => {
|
||||
similarityCache.stopCacheCleanup();
|
||||
});
|
||||
it('should test immowelt provider', async () => {
|
||||
provider.init(providerConfig.immowelt, [], []);
|
||||
const Fredy = proxyquire('../../lib/FredyRuntime', {
|
||||
@@ -16,7 +20,7 @@ describe('#immowelt testsuite()', () => {
|
||||
});
|
||||
|
||||
return await new Promise((resolve) => {
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1');
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1', similarityCache);
|
||||
fredy.execute().then((listing) => {
|
||||
expect(listing).to.be.a('array');
|
||||
|
||||
@@ -26,7 +30,7 @@ describe('#immowelt testsuite()', () => {
|
||||
|
||||
notificationObj.payload.forEach((notify) => {
|
||||
/** check the actual structure **/
|
||||
expect(notify.id).to.be.a('number');
|
||||
expect(notify.id).to.be.a('string');
|
||||
expect(notify.price).to.be.a('string');
|
||||
expect(notify.size).to.be.a('string');
|
||||
expect(notify.title).to.be.a('string');
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
const similarityCache = require('../../lib/services/similarity-check/similarityCache');
|
||||
const mockNotification = require('../mocks/mockNotification');
|
||||
const providerConfig = require('./testProvider.json');
|
||||
const mockStore = require('../mocks/mockStore');
|
||||
@@ -6,6 +7,9 @@ const expect = require('chai').expect;
|
||||
const provider = require('../../lib/provider/kleinanzeigen');
|
||||
|
||||
describe('#kleinanzeigen testsuite()', () => {
|
||||
after(() => {
|
||||
similarityCache.stopCacheCleanup();
|
||||
});
|
||||
it('should test kleinanzeigen provider', async () => {
|
||||
provider.init(providerConfig.kleinanzeigen, [], []);
|
||||
const Fredy = proxyquire('../../lib/FredyRuntime', {
|
||||
@@ -16,7 +20,7 @@ describe('#kleinanzeigen testsuite()', () => {
|
||||
});
|
||||
|
||||
return await new Promise((resolve) => {
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1');
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1', similarityCache);
|
||||
fredy.execute().then((listing) => {
|
||||
expect(listing).to.be.a('array');
|
||||
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
const similarityCache = require('../../lib/services/similarity-check/similarityCache');
|
||||
const mockNotification = require('../mocks/mockNotification');
|
||||
const providerConfig = require('./testProvider.json');
|
||||
const mockStore = require('../mocks/mockStore');
|
||||
@@ -6,6 +7,9 @@ const expect = require('chai').expect;
|
||||
const provider = require('../../lib/provider/neubauKompass');
|
||||
|
||||
describe('#neubauKompass testsuite()', () => {
|
||||
after(() => {
|
||||
similarityCache.stopCacheCleanup();
|
||||
});
|
||||
provider.init(providerConfig.neubauKompass, [], []);
|
||||
const Fredy = proxyquire('../../lib/FredyRuntime', {
|
||||
'./services/storage/listingsStorage': {
|
||||
@@ -16,7 +20,7 @@ describe('#neubauKompass testsuite()', () => {
|
||||
|
||||
it('should test neubauKompass provider', async () => {
|
||||
return await new Promise((resolve) => {
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1');
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1', similarityCache);
|
||||
fredy.execute().then((listing) => {
|
||||
expect(listing).to.be.a('array');
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
"enabled": true
|
||||
},
|
||||
"immowelt": {
|
||||
"url": "https://www.immowelt.de/liste/duesseldorf-benrath/wohnungen/kaufen?geoid=10805111000004%2C10805111000005%2C10805111000006%2C10805111000007%2C10805111000009%2C10805111000010%2C10805111000011%2C10805111000013%2C10805111000014%2C10805111000015%2C10805111000016%2C10805111000017%2C10805111000018%2C10805111000019%2C10805111000023%2C10805111000024%2C10805111000027%2C10805111000032%2C10805111000034%2C10805111000035%2C10805111000039%2C10805111000041%2C10805111000042%2C10805111000043%2C10805111000047%2C10805111000048%2C10805111000049%2C10805111000051%2C10805111000052%2C10805111000053&roomi=3&prima=420000&wflmi=90&sort=createdate%2Bdesc",
|
||||
"url": "https://www.immowelt.de/liste/duesseldorf/wohnungen/kaufen?d=true&rmi=3&sd=DESC&sf=TIMESTAMP&sp=1",
|
||||
"enabled": true
|
||||
},
|
||||
"immoscout": {
|
||||
@@ -21,7 +21,7 @@
|
||||
"enabled": true
|
||||
},
|
||||
"kleinanzeigen": {
|
||||
"url": "https://www.ebay-kleinanzeigen.de/s-wohnung-kaufen/duesseldorf/anzeige:angebote/preis::420000/wohnung/k0c196l2068r5+wohnung_kaufen.qm_d:90,+wohnung_kaufen.zimmer_d:3.5,",
|
||||
"url": "https://www.ebay-kleinanzeigen.de/s-immobilien/duesseldorf/anzeige:angebote/wohnung/k0c195l2068r5",
|
||||
"enabled": true
|
||||
},
|
||||
"neubauKompass": {
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
const similarityCache = require('../../lib/services/similarity-check/similarityCache');
|
||||
const mockNotification = require('../mocks/mockNotification');
|
||||
const providerConfig = require('./testProvider.json');
|
||||
const mockStore = require('../mocks/mockStore');
|
||||
@@ -6,6 +7,9 @@ const expect = require('chai').expect;
|
||||
const provider = require('../../lib/provider/wgGesucht');
|
||||
|
||||
describe('#wgGesucht testsuite()', () => {
|
||||
after(() => {
|
||||
similarityCache.stopCacheCleanup();
|
||||
});
|
||||
provider.init(providerConfig.wgGesucht, [], []);
|
||||
const Fredy = proxyquire('../../lib/FredyRuntime', {
|
||||
'./services/storage/listingsStorage': {
|
||||
@@ -16,7 +20,7 @@ describe('#wgGesucht testsuite()', () => {
|
||||
|
||||
it('should test wgGesucht provider', async () => {
|
||||
return await new Promise((resolve) => {
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1');
|
||||
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1', similarityCache);
|
||||
fredy.execute().then((listing) => {
|
||||
expect(listing).to.be.a('array');
|
||||
const notificationObj = mockNotification.get();
|
||||
|
||||
39
test/similarity/similarity.test.js
Normal file
39
test/similarity/similarity.test.js
Normal file
@@ -0,0 +1,39 @@
|
||||
const SimilarityCacheEntry = require('../../lib/services/similarity-check/SimilarityCacheEntry');
|
||||
const expect = require('chai').expect;
|
||||
|
||||
describe('similarityCheck', () => {
|
||||
describe('#similarityCheck()', () => {
|
||||
it('should be false', () => {
|
||||
const check = new SimilarityCacheEntry(0);
|
||||
check.setCacheEntry('Hallo');
|
||||
expect(check.hasSimilarEntries('Welt')).to.be.false;
|
||||
});
|
||||
it('should be true', () => {
|
||||
const check = new SimilarityCacheEntry(0);
|
||||
check.setCacheEntry('Hallo');
|
||||
expect(check.hasSimilarEntries('hallo')).to.be.true;
|
||||
});
|
||||
it('should be true', () => {
|
||||
const check = new SimilarityCacheEntry(0);
|
||||
check.setCacheEntry('Selling an incredible house in san francisco');
|
||||
expect(check.hasSimilarEntries('incredible house in san francisco for sale')).to.be.true;
|
||||
});
|
||||
it('should be true', () => {
|
||||
const check = new SimilarityCacheEntry(0);
|
||||
check.setCacheEntry('a');
|
||||
check.setCacheEntry('b');
|
||||
check.setCacheEntry('c');
|
||||
check.setCacheEntry('d');
|
||||
expect(check.hasSimilarEntries('b')).to.be.true;
|
||||
});
|
||||
it('should be false', () => {
|
||||
const check = new SimilarityCacheEntry(0);
|
||||
check.setCacheEntry(
|
||||
'The index is known by several other names, especially Sørensen–Dice index,[3] Sørensen index and Dice\'s coefficient. Other variations include the "similarity coefficient" or "index", such as Dice similarity coefficient (DSC). Common alternate spellings for Sørensen are Sorenson, Soerenson and Sörenson, and all three can also be seen with the –sen ending.'
|
||||
);
|
||||
check.setCacheEntry(
|
||||
'where |X| and |Y| are the cardinalities of the two sets (i.e. the number of elements in each set). The Sørensen index equals twice the number of elements common to both sets divided by the sum of the number of elements in each set.'
|
||||
);
|
||||
});
|
||||
});
|
||||
});
|
||||
@@ -40,9 +40,8 @@ export default function Login() {
|
||||
|
||||
return (
|
||||
<div className="login">
|
||||
<div className="login__bgImage" style={{ background: `url("${cityBackground}")` }} />
|
||||
<Logo />
|
||||
<div className="login__bgImage" style={{ background: `url(${cityBackground})` }} />
|
||||
|
||||
<form>
|
||||
<div className="login__loginWrapper">
|
||||
{error && <Message negative icon="error" content={error} />}
|
||||
|
||||
@@ -2,18 +2,17 @@
|
||||
display: flex;
|
||||
justify-content: center;
|
||||
align-items: center;
|
||||
width:100%;
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
|
||||
&__bgImage {
|
||||
background-size: cover;
|
||||
filter: blur(8px);
|
||||
-webkit-filter: blur(8px);
|
||||
background-size: cover;
|
||||
position: absolute;
|
||||
top: 0;
|
||||
left: 0;
|
||||
z-index: -1;
|
||||
z-index: 0;
|
||||
right: 0;
|
||||
bottom: 0;
|
||||
}
|
||||
@@ -23,9 +22,14 @@
|
||||
border-radius: 30px;
|
||||
height: 25rem;
|
||||
width: 30rem;
|
||||
z-index: 1;
|
||||
background-color: #151313ab;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
padding: 2rem;
|
||||
}
|
||||
|
||||
form {
|
||||
z-index: 1;
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user