Compare commits

...

22 Commits
5.4.3 ... 5.4.8

Author SHA1 Message Date
orangecoding
5347d0014d adding scraping ant infos as we now need to use residental proxies 2022-03-25 11:19:17 +01:00
Christian Kellner
946b70003f dependency management | fixing dev mode 2022-03-09 14:28:13 +01:00
Christian Kellner
a6e6656882 Update package.json
next version
2022-03-09 08:59:41 +01:00
Stefan Berger
fbea1aabc4 Add mattermost adapter (#49) (#50) 2022-03-09 05:44:39 +01:00
orangecoding
2dd01ca38f fixing comments 2022-02-15 13:18:20 +01:00
Carl Ambroselli
f010e8951b Add sqlite adapter (#48) 2022-02-15 13:15:27 +01:00
Carl Ambroselli
5225098006 Fix address of immonet (#47) 2022-02-15 13:14:49 +01:00
orangecoding
6e6144e02f fixing tests 2022-02-15 13:14:19 +01:00
Christian Kellner
aa49773a4d improve readme 2022-01-31 10:54:12 +01:00
Jochen Schalanda
b6b8d6814c Make requestDriver more resilient to errors (#46)
If the async request performed in `requestDriver.makeDriver()` fails, it would call the `callback` function with empty parameters but then continue the execution which can lead to the following error and crash of Fredy:
```
Error while trying to scrape data. Received error: Request failed with status code 504
/fredy/lib/services/requestDriver.js:25
    if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
                      ^

TypeError: Cannot read properties of undefined (reading 'data')
    at driver (/fredy/lib/services/requestDriver.js:25:23)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
```
2022-01-31 10:48:11 +01:00
Christian Kellner
b8d658a948 improve instana monitoring 2022-01-31 10:17:46 +01:00
Christian Kellner
bce0c57b02 adding instana monitoring for fredy 2022-01-31 10:01:31 +01:00
Christian Kellner
5e547baa76 Update diagram 2022-01-29 19:35:13 +01:00
Christian Kellner
b368ca7ab8 diagram.drawio 2022-01-29 19:29:02 +01:00
Christian Kellner
eb85641dfb Update CHANGELOG.md 2022-01-28 08:50:46 +01:00
Christian Kellner
0a13037b83 Update CHANGELOG.md
upgrading changelog
2022-01-28 08:50:21 +01:00
Christian Kellner
5600b9766b Update FredyRuntime.js
remove unnecessary todo
2022-01-26 14:48:38 +01:00
Christian Kellner
63b232521e next version 2022-01-26 14:42:23 +01:00
Jochen Schalanda
2f5cc31ae3 Add support for Immo Südwest Presse (immo.swp.de) (#45) 2022-01-26 14:41:44 +01:00
Jochen Schalanda
70e78492ec Telegram: Use job name instead of ID and link in title (#44) 2022-01-26 14:39:53 +01:00
Jochen Schalanda
47adb88cb5 Fix race condition if user ID is in session but not in user store (#43) 2022-01-25 15:11:21 +01:00
Jochen Schalanda
e5627e1d02 Allow visiting the original provider URL (#42)
Instead of truncating the original URL of each provider in the job configuration to 60 characters and losing a lot of context information, put a link to the original URL in the provider table which can be opened directly to verify what is being scraped by Fredy.
2022-01-25 14:20:42 +01:00
24 changed files with 1929 additions and 979 deletions

2
.nvmrc
View File

@@ -1 +1 @@
12.18.3
16.14.0

View File

@@ -1,4 +1,21 @@
###### [V5.4.0]
###### [V5.4.5]
- Adding Instana node.js monitoring
###### [V5.4.4]
- Add support for Immo Südwest Presse (immo.swp.de)
- Telegram: Use job name instead of ID and link in title
- Fix race condition if user ID is in session but not in user store
- Allow visiting the original provider URL
###### [V5.4.3]
- re-writing readme
- improving docker build
- using github's actions to build docker and test automatically
###### [V5.4.2]
- Fixing prod build
###### [V5.4.1]
- Upgrading dependencies
- Provider urls are now automagically been changed to include the correct sort order for search results
@@ -45,4 +62,4 @@ on the new ui and use the values from your previous config file if needed.
[BREAKING CHANGES]
- The config has been changed, the config of V1.x will not work any longer
- Sources have been renamed to provider
```
```

View File

@@ -17,7 +17,7 @@ function normalize(o) {
return Object.assign(o, { id });
}
//apply blaclist if needed
//apply blacklist if needed
function applyBlacklist(o) {
const titleNotBlacklisted = !utils.isOneOf(o.title, appliedBlackList);
const descNotBlacklisted = !utils.isOneOf(o.description, appliedBlackList);

View File

@@ -69,7 +69,7 @@ yarn run test
# Architecture
![Architecture](/doc/architecture.jpg "Architecture")
## Immoscout
### Immoscout
I have added **experimental** support for Immoscout. Immoscout is somewhat special, because they have decided to secure their service from bots using Re-Capture. Finding a way around this is barely possible. For _Fredy_ to be able to bypass this check, I'm using a service called [ScrapingAnt](https://scrapingant.com/). The trick is to use a headless browser, rotating proxies and (once successfully validated) to re-send the cookies each time.
To be able to use Immoscout, you need to create an account at ScrapingAnt. Configure the API key in the "General Settings" tab (visible when logged in as administrator).
@@ -77,9 +77,15 @@ The rest will be handled by _Fredy_. Keep in mind, the support is experimental.
If you need more than the 1000 API calls allowed per month, I'd suggest opting for a paid account... ScrapingAnt loves OpenSource, therefore they have decided to give all _Fredy_ users a 10% discount by using the code **FREDY10** (Disclaimer: I do not earn any money for recommending their service).
#### Contribution guidelines
### Contribution guidelines
See [Contributing](https://github.com/orangecoding/fredy/blob/master/CONTRIBUTING.md)
See [Contributing](https://github.com/orangecoding/fredy/blob/master/CONTRIBUTING.md)
### Monitoring
_Fredy_ can be monitored by [Instana](https://www.instana.com). If you are interested, sign up for a free trial. This is totally optional of course :)
If you want to use Instana to monitor _Fredy_, please change the variable `INSTANA_MONITORING` in the `.env` file to `true`.
If you want to know more, head over to the [Instana docs](https://www.ibm.com/docs/en/obi/current?topic=technologies-monitoring-nodejs).
# Docker
Use the Dockerfile in this repository to build an image.

View File

@@ -0,0 +1,84 @@
<mxfile host="app.diagrams.net" modified="2022-01-29T18:34:51.211Z" agent="5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36" etag="W0jmvptvMSkuHq89hwUy" version="16.5.2" type="github">
<diagram id="C5RBs43oDa-KdzZeNtuy" name="Page-1">
<mxGraphModel dx="850" dy="907" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
<root>
<mxCell id="WIyWlLk6GJQsqaUBKTNV-0" />
<mxCell id="WIyWlLk6GJQsqaUBKTNV-1" parent="WIyWlLk6GJQsqaUBKTNV-0" />
<mxCell id="4kAlOAlRylSy7JMoHAEd-5" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="WIyWlLk6GJQsqaUBKTNV-3" target="WIyWlLk6GJQsqaUBKTNV-7">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="WIyWlLk6GJQsqaUBKTNV-3" value="Job1" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;glass=0;strokeWidth=1;shadow=0;fillColor=#dae8fc;strokeColor=#6c8ebf;" parent="WIyWlLk6GJQsqaUBKTNV-1" vertex="1">
<mxGeometry x="100" y="50" width="120" height="30" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-8" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="WIyWlLk6GJQsqaUBKTNV-7" target="4kAlOAlRylSy7JMoHAEd-2">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-10" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="WIyWlLk6GJQsqaUBKTNV-7" target="4kAlOAlRylSy7JMoHAEd-3">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-11" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="WIyWlLk6GJQsqaUBKTNV-7" target="4kAlOAlRylSy7JMoHAEd-4">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="WIyWlLk6GJQsqaUBKTNV-7" value="FredyRuntime" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;glass=0;strokeWidth=1;shadow=0;fillColor=#fff2cc;strokeColor=#d6b656;" parent="WIyWlLk6GJQsqaUBKTNV-1" vertex="1">
<mxGeometry x="110" y="120" width="360" height="40" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-6" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="4kAlOAlRylSy7JMoHAEd-0" target="WIyWlLk6GJQsqaUBKTNV-7">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-0" value="Job2" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;glass=0;strokeWidth=1;shadow=0;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="WIyWlLk6GJQsqaUBKTNV-1">
<mxGeometry x="230" y="50" width="120" height="30" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-7" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="4kAlOAlRylSy7JMoHAEd-1" target="WIyWlLk6GJQsqaUBKTNV-7">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-1" value="Job3" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;glass=0;strokeWidth=1;shadow=0;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="WIyWlLk6GJQsqaUBKTNV-1">
<mxGeometry x="360" y="50" width="120" height="30" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-13" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="4kAlOAlRylSy7JMoHAEd-2" target="4kAlOAlRylSy7JMoHAEd-12">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-2" value="Provider1" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;glass=0;strokeWidth=1;shadow=0;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="WIyWlLk6GJQsqaUBKTNV-1">
<mxGeometry x="100" y="210" width="120" height="40" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-14" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="4kAlOAlRylSy7JMoHAEd-3">
<mxGeometry relative="1" as="geometry">
<mxPoint x="290" y="290" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-3" value="Provider2" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;glass=0;strokeWidth=1;shadow=0;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="WIyWlLk6GJQsqaUBKTNV-1">
<mxGeometry x="230" y="210" width="120" height="40" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-15" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="4kAlOAlRylSy7JMoHAEd-4" target="4kAlOAlRylSy7JMoHAEd-12">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-4" value="Provider3" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;glass=0;strokeWidth=1;shadow=0;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="WIyWlLk6GJQsqaUBKTNV-1">
<mxGeometry x="360" y="210" width="120" height="40" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-17" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="4kAlOAlRylSy7JMoHAEd-12" target="4kAlOAlRylSy7JMoHAEd-16">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-12" value="Similarity check" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;glass=0;strokeWidth=1;shadow=0;fillColor=#e1d5e7;strokeColor=#9673a6;" vertex="1" parent="WIyWlLk6GJQsqaUBKTNV-1">
<mxGeometry x="110" y="290" width="360" height="40" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-20" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="4kAlOAlRylSy7JMoHAEd-16" target="4kAlOAlRylSy7JMoHAEd-18">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-16" value="Found similarity" style="rhombus;whiteSpace=wrap;html=1;" vertex="1" parent="WIyWlLk6GJQsqaUBKTNV-1">
<mxGeometry x="250" y="360" width="80" height="80" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-21" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="WIyWlLk6GJQsqaUBKTNV-1" source="4kAlOAlRylSy7JMoHAEd-18" target="4kAlOAlRylSy7JMoHAEd-19">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-18" value="Notification Adapter1" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;glass=0;strokeWidth=1;shadow=0;fillColor=#f8cecc;strokeColor=#b85450;" vertex="1" parent="WIyWlLk6GJQsqaUBKTNV-1">
<mxGeometry x="230" y="460" width="120" height="40" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-19" value="Notification Adapter2" style="rounded=1;whiteSpace=wrap;html=1;fontSize=12;glass=0;strokeWidth=1;shadow=0;fillColor=#f8cecc;strokeColor=#b85450;" vertex="1" parent="WIyWlLk6GJQsqaUBKTNV-1">
<mxGeometry x="230" y="520" width="120" height="40" as="geometry" />
</mxCell>
<mxCell id="4kAlOAlRylSy7JMoHAEd-22" value="No" style="text;html=1;resizable=0;autosize=1;align=center;verticalAlign=middle;points=[];fillColor=none;strokeColor=none;rounded=0;" vertex="1" parent="WIyWlLk6GJQsqaUBKTNV-1">
<mxGeometry x="300" y="440" width="30" height="20" as="geometry" />
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>

View File

@@ -62,7 +62,6 @@ class FredyRuntime {
if (this._providerConfig.paginate != null) {
xray(u, this._providerConfig.crawlContainer, [this._providerConfig.crawlFields])
//the first 2 pages should be enough here
//TODO: Think about automagically sort by date
.limit(2)
.paginate(this._providerConfig.paginate)
.then((listings) => {

View File

@@ -1,10 +1,10 @@
const service = require('restana')();
const jobRouter = service.newRouter();
const axios = require('axios');
const jobStorage = require('../../services/storage/jobStorage');
const userStorage = require('../../services/storage/userStorage');
const immoscoutProvider = require('../../provider/immoscout');
const config = require('../../../conf/config.json');
const { isAdmin } = require('../security');
function doesJobBelongsToUser(job, req) {
@@ -30,9 +30,23 @@ jobRouter.get('/', async (req, res) => {
});
jobRouter.get('/processingTimes', async (req, res) => {
let scrapingAntData = null;
if (config.scrapingAnt.apiKey != null && config.scrapingAnt.apiKey.length > 0) {
try {
const result = await axios({
url: `https://api.scrapingant.com/v1/usage?x-api-key=${config.scrapingAnt.apiKey}`,
});
scrapingAntData = result.data;
} catch (Exception) {
console.error('Could not query plan data from scraping ant.', Exception);
}
}
res.body = {
interval: config.interval,
lastRun: config.lastRun || null,
scrapingAntData,
};
res.send();

View File

@@ -5,13 +5,13 @@ const hasher = require('../../services/security/hash');
loginRouter.get('/user', async (req, res) => {
const currentUserId = req.session.currentUser;
const isAdmin = currentUserId == null ? false : userStorage.getUser(currentUserId).isAdmin;
if (currentUserId == null) {
const currentUser = currentUserId == null ? null : userStorage.getUser(currentUserId);
if (currentUser == null) {
res.body = {};
} else {
res.body = {
userId: currentUserId,
isAdmin,
userId: currentUser.id,
isAdmin: currentUser.isAdmin,
};
}
res.send();

View File

@@ -0,0 +1,52 @@
const { markdown2Html } = require('../../services/markdown');
const { getJob } = require('../../services/storage/jobStorage');
const axios = require('axios');
/**
* sends new listings to mattermost
* @param serviceName e.g immowelt
* @param newListings an array with newly found listings
* @param notificationConfig config of this notification adapter
* @param jobKey name of the current job that is being executed
* @returns {Promise<Void> | void}
*/
exports.send = ({ serviceName, newListings, notificationConfig, jobKey }) => {
const { webhook, channel } = notificationConfig.find((adapter) => adapter.id === 'mattermost').fields;
const job = getJob(jobKey);
const jobName = job == null ? jobKey : job.name;
let message = `### *${jobName}* (${serviceName}) found **${newListings.length}** new listings:\n\n`;
message += `| Title | Address | Size | Price |\n|:----|:----|:----|:----|\n`;
message += newListings.map(
(o) => `| [${o.title}](${o.link}) | ` + [o.address, o.size.replace(/2m/g, '$m^2$'), o.price].join(' | ') + ' |\n'
);
return axios.post(`${webhook}`, {
channel: channel,
text: message,
});
};
/**
* exported config is being used in the frontend to generate the fields
* incoming values will be the keys (and values) of the fields
*
*/
exports.config = {
id: __filename.slice(__dirname.length + 1, -3),
name: 'Mattermost',
readme: markdown2Html('lib/notification/adapter/mattermost.md'),
description: 'Fredy will send new listings to your mattermost team chat.',
fields: {
webhook: {
type: 'text',
label: 'Webhook-URL',
description: 'The incoming webhook url',
},
channel: {
type: 'text',
label: 'Channel',
description: 'The channel where fredy should send notifications to.',
},
},
};

View File

@@ -0,0 +1,5 @@
### Mattermost Adapter
For Mattermost, you need to create a incoming webhook. This is pretty easy. Please visit the steps in the [developer docs](https://docs.mattermost.com/developer/webhooks-incoming.html) and follow the instructions.
As a result, you get the webhook URL for configuration in fredy. In addition, the target channel must be defined.

View File

@@ -0,0 +1,33 @@
const { markdown2Html } = require('../../services/markdown');
const Database = require('better-sqlite3');
/**
* Stores data in a sqlite db in order to use the search results for later analytics
* @param serviceName e.g immowelt
* @param newListings an array with newly found listings
* @param jobKey name of the current job that is being executed
*/
exports.send = ({ serviceName, newListings, jobKey }) => {
const db = new Database('db/listings.db');
const fields = ['serviceName', 'jobKey', 'id', 'size', 'rooms', 'price', 'address', 'title', 'link', 'description'];
db.prepare(`CREATE TABLE IF NOT EXISTS listing (${fields.join(' TEXT, ')} TEXT);`).run();
const insert = db.prepare(`INSERT INTO listing (${fields.join(', ')}) VALUES (@${fields.join(', @')})`);
newListings.map((listing) => {
let insertListing = {};
fields.map((field) => {
insertListing[field] = listing[field];
});
insertListing.serviceName = serviceName;
insertListing.jobKey = jobKey;
insert.run(insertListing);
});
return Promise.resolve();
};
exports.config = {
id: __filename.slice(__dirname.length + 1, -3),
name: 'Sqlite',
description: 'This adapter stores listings in a local sqlite3 database.',
config: {},
readme: markdown2Html('lib/notification/adapter/sqlite.md'),
};

View File

@@ -0,0 +1,3 @@
### Sqlite Adapter
This adapter stores search results in an sqlite database in db/listings.db

View File

@@ -1,4 +1,5 @@
const { markdown2Html } = require('../../services/markdown');
const { getJob } = require('../../services/storage/jobStorage');
const axios = require('axios');
/**
@@ -19,23 +20,24 @@ const arrayChunks = (inputArray, perChunk) =>
* @param serviceName e.g immowelt
* @param newListings an array with newly found listings
* @param notificationConfig config of this notification adapter
* * @param jobKey name of the current job that is being executed
* @param jobKey name of the current job that is being executed
* @returns {Promise<Void> | void}
*/
exports.send = ({ serviceName, newListings, notificationConfig, jobKey }) => {
const { token, chatId } = notificationConfig.find((adapter) => adapter.id === 'telegram').fields;
const job = getJob(jobKey);
const jobName = job == null ? jobKey : job.name;
//we have to split messages into chunk, because otherwise messages are going to become too big and will fail
const chunks = arrayChunks(newListings, 3);
const promises = chunks.map((chunk) => {
let message = `Job: ${jobKey} | Service <b>${serviceName}</b> found <b>${newListings.length}</b> new listings:\n\n`;
let message = `<i>${jobName}</i> (${serviceName}) found <b>${newListings.length}</b> new listings:\n\n`;
message += chunk.map(
(o) =>
`<b>${shorten(o.title.replace(/\*/g, ''), 45)}</b>\n` +
`<a href="${o.link}"><b>${shorten(o.title.replace(/\*/g, ''), 45).trim()}</b></a>\n` +
[o.address, o.price, o.size].join(' | ') +
'\n' +
`<a href="${o.link}">${o.link}</a>\n\n`
'\n\n'
);
return axios.post(`https://api.telegram.org/bot${token}/sendMessage`, {

View File

@@ -6,7 +6,7 @@ function normalize(o) {
const id = parseInt(o.id.substring(o.id.indexOf('_') + 1, o.id.length));
const size = o.size != null ? o.size.replace('Wohnfläche ', '') : 'N/A m²';
const price = o.price.replace('Kaufpreis ', '');
const address = o.address.split(' • ')[1];
const address = o.address.split(' • ')[o.address.split(' • ').length - 1];
const title = o.title || 'No title available';
//normally we would just read the link from the source, but immonet decided to trick user by adding a click listener instead of
//a href to do some weird reporting. (Very user friendly for handicaped ppl... not)

52
lib/provider/immoswp.js Executable file
View File

@@ -0,0 +1,52 @@
const utils = require('../utils');
let appliedBlackList = [];
function normalize(o) {
const id = o.id.substring(o.id.indexOf('-') + 1, o.id.length);
const size = o.size || 'N/A m²';
const price = (o.price || '--- €').replace('Preis auf Anfrage', '--- €');
const address = o.address || 'No address available';
const title = o.title || 'No title available';
const link = `https://immo.swp.de/immobilien/${id}`;
const description = o.description;
return Object.assign(o, { id, address, price, size, title, link, description });
}
function applyBlacklist(o) {
const titleNotBlacklisted = !utils.isOneOf(o.title, appliedBlackList);
const descNotBlacklisted = !utils.isOneOf(o.description, appliedBlackList);
return titleNotBlacklisted && descNotBlacklisted;
}
const config = {
url: null,
crawlContainer: '.js-serp-item',
sortByDateParam: 's=most_recently_updated_first',
crawlFields: {
id: '@id',
price: 'div.item__spec.item-spec-price | trim',
size: 'div.item__spec.item-spec-area | trim',
title: 'a.js-item-title-link@title',
address: 'div.item__locality | removeNewline | trim',
description: 'div.item__main-info-points.clearfix p small | removeNewline | trim',
},
paginate: 'li.page-item.pagination__item a.page-link@href',
normalize: normalize,
filter: applyBlacklist,
};
exports.init = (sourceConfig, blacklist) => {
config.enabled = sourceConfig.enabled;
config.url = sourceConfig.url;
appliedBlackList = blacklist || [];
};
exports.metaInformation = {
name: 'Immo Südwest Presse',
baseUrl: 'https://immo.swp.de/',
id: __filename.slice(__dirname.length + 1, -3),
};
exports.config = config;

View File

@@ -7,30 +7,29 @@ function makeDriver(headers = {}) {
let cookies = '';
return async function driver(context, callback) {
const url = context.url;
let result;
try {
result = await axios({
const url = context.url;
const result = await axios({
url,
headers: {
...headers,
Cookie: cookies,
},
});
if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
//assume we have gotten a response from scrapingAnt
if (cookies.length === 0) {
cookies = result.data.cookies;
}
callback(null, result.data.content);
} else {
callback(null, result.data);
}
} catch (exception) {
console.error(`Error while trying to scrape data. Received error: ${exception.message}`);
callback(null, []);
}
if (typeof result.data === 'object' && url.toLowerCase().indexOf('scrapingant') !== -1) {
//assume we have gotten a response from scrapingAnt
if (cookies.length === 0) {
cookies = result.data.cookies;
}
callback(null, result.data.content);
} else {
callback(null, result.data);
}
};
}

View File

@@ -1,5 +1,5 @@
const { metaInformation } = require('../provider/immoscout');
//to better confure re-capture chose a random proxy each time we do a call
//to better configure re-capture chose a random proxy each time we do a call
const proxies = ['ae', 'br', 'cn', 'de', 'es', 'fr', 'gb', 'hk', 'in', 'it', 'il', 'jp', 'nl', 'ru', 'sa', 'us', 'cz'];
const config = require('../../conf/config.json');
@@ -12,7 +12,9 @@ exports.transformUrlForScrapingAnt = (url, id) => {
if (isImmoscout(id)) {
//only do calls to scrapingAnt when dealing with Immoscout
url = `https://api.scrapingant.com/v1/general?url=${encodeURIComponent(url)}&proxy_country=${randomProxy}`;
url = `https://api.scrapingant.com/v1/general?url=${encodeURIComponent(
url
)}&proxy_country=${randomProxy}&proxy_type=residential`;
}
return url;
};

View File

@@ -1,10 +1,10 @@
{
"name": "fredy",
"version": "5.4.3",
"version": "5.4.8",
"description": "[F]ind [R]eal [E]states [d]amn eas[y].",
"scripts": {
"start": "node index.js",
"dev": "yarn && export BUILD_DEV='true' && export NODE_ENV='development' && webpack-dev-server --progress --colors --watch --config ./webpack.dev.js",
"dev": "yarn && export BUILD_DEV='true' && export NODE_ENV='development' && webpack serve --progress --color --config ./webpack.dev.js",
"prod": "export BUILD_DEV='false' && webpack --node-env=production --config ./webpack.prod.js",
"format": "prettier --write lib/**/*.js ui/src/**/*.js test/**/*.js *.js --single-quote --print-width 120",
"test": "mocha --timeout 20000 test/**/*.test.js",
@@ -43,8 +43,8 @@
},
"license": "MIT",
"engines": {
"node": ">=12.13.0",
"npm": ">=6.0.0"
"node": ">=14.0.0",
"npm": ">=7.0.0"
},
"browserslist": [
"> 0.5%",
@@ -55,19 +55,20 @@
"dependencies": {
"@rematch/core": "2.2.0",
"@rematch/loading": "2.1.2",
"@sendgrid/mail": "7.6.0",
"axios": "0.24.0",
"@sendgrid/mail": "7.6.2",
"axios": "0.26.1",
"axios-retry": "^3.2.4",
"body-parser": "1.19.0",
"cookie-session": "1.4.0",
"better-sqlite3": "^7.5.0",
"body-parser": "1.19.2",
"cookie-session": "2.0.0",
"handlebars": "4.7.7",
"highcharts": "9.3.1",
"highcharts": "10.0.0",
"highcharts-react-official": "3.1.0",
"lowdb": "1.0.0",
"markdown": "^0.5.0",
"nanoid": "3.1.30",
"node-mailjet": "3.3.4",
"query-string": "^7.0.1",
"nanoid": "3.3.1",
"node-mailjet": "3.3.7",
"query-string": "7.1.1",
"react": "17.0.2",
"react-dom": "17.0.2",
"react-redux": "7.2.6",
@@ -75,41 +76,41 @@
"react-router-dom": "5.3.0",
"react-switch": "^6.0.0",
"redux": "4.1.2",
"redux-thunk": "2.4.0",
"restana": "4.9.2",
"semantic-ui-react": "2.0.4",
"serve-static": "^1.14.1",
"redux-thunk": "2.4.1",
"restana": "4.9.3",
"semantic-ui-react": "2.1.2",
"serve-static": "1.15.0",
"slack": "11.0.2",
"string-similarity": "^4.0.4",
"x-ray": "2.3.4"
},
"devDependencies": {
"@babel/core": "7.16.0",
"@babel/preset-env": "7.16.4",
"@babel/preset-react": "7.16.0",
"@babel/core": "7.17.8",
"@babel/preset-env": "7.16.11",
"@babel/preset-react": "7.16.7",
"babel-eslint": "10.1.0",
"babel-loader": "8.2.3",
"chai": "4.3.4",
"babel-loader": "8.2.4",
"chai": "4.3.6",
"clean-webpack-plugin": "4.0.0",
"copy-webpack-plugin": "10.0.0",
"css-loader": "6.5.1",
"copy-webpack-plugin": "10.2.4",
"css-loader": "6.7.1",
"eslint": "7.32.0",
"eslint-config-prettier": "8.3.0",
"eslint-plugin-react": "7.27.1",
"eslint-config-prettier": "8.5.0",
"eslint-plugin-react": "7.29.3",
"file-loader": "6.2.0",
"history": "5.1.0",
"history": "5.3.0",
"husky": "4.3.8",
"less": "4.1.2",
"less-loader": "10.2.0",
"lint-staged": "12.1.2",
"mocha": "9.1.3",
"prettier": "2.5.0",
"lint-staged": "12.3.7",
"mocha": "9.2.2",
"prettier": "2.6.1",
"proxyquire": "2.1.3",
"redux-logger": "3.0.6",
"style-loader": "3.3.1",
"url-loader": "4.1.1",
"webpack": "5.64.4",
"webpack-cli": "4.9.1",
"webpack": "5.70.0",
"webpack-cli": "4.9.2",
"webpack-dev-server": "3.11.2",
"webpack-merge": "5.8.0"
}

View File

@@ -0,0 +1,51 @@
const similarityCache = require('../../lib/services/similarity-check/similarityCache');
const mockNotification = require('../mocks/mockNotification');
const providerConfig = require('./testProvider.json');
const mockStore = require('../mocks/mockStore');
const proxyquire = require('proxyquire').noCallThru();
const expect = require('chai').expect;
const provider = require('../../lib/provider/immoswp');
describe('#immoswp testsuite()', () => {
after(() => {
similarityCache.stopCacheCleanup();
});
provider.init(providerConfig.immoswp, [], []);
const Fredy = proxyquire('../../lib/FredyRuntime', {
'./services/storage/listingsStorage': {
...mockStore,
},
'./notification/notify': mockNotification,
});
it('should test immoswp provider', async () => {
return await new Promise((resolve) => {
const fredy = new Fredy(provider.config, null, provider.metaInformation.id, 'test1', similarityCache);
fredy.execute().then((listing) => {
expect(listing).to.be.a('array');
const notificationObj = mockNotification.get();
expect(notificationObj).to.be.a('object');
expect(notificationObj.serviceName).to.equal('immoswp');
notificationObj.payload.forEach((notify) => {
/** check the actual structure **/
expect(notify.id).to.be.a('string');
expect(notify.price).to.be.a('string');
expect(notify.size).to.be.a('string');
expect(notify.title).to.be.a('string');
expect(notify.link).to.be.a('string');
expect(notify.address).to.be.a('string');
/** check the values if possible **/
expect(notify.price).that.does.include('€');
expect(notify.title).to.be.not.empty;
expect(notify.link).that.does.include('https://immo.swp.de');
expect(notify.address).to.be.not.empty;
});
resolve();
});
});
});
});

View File

@@ -32,13 +32,13 @@ describe('#immowelt testsuite()', () => {
/** check the actual structure **/
expect(notify.id).to.be.a('string');
expect(notify.price).to.be.a('string');
expect(notify.size).to.be.a('string');
expect(notify.title).to.be.a('string');
expect(notify.link).to.be.a('string');
expect(notify.address).to.be.a('string');
/** check the values if possible **/
if (notify.size.trim().toLowerCase() !== 'k.a.') {
if (notify.size != null && notify.size.trim().toLowerCase() !== 'k.a.') {
expect(notify.size).that.does.include('m²');
}
expect(notify.title).to.be.not.empty;

View File

@@ -16,6 +16,10 @@
"url": "https://www.immobilienscout24.de/Suche/de/nordrhein-westfalen/duesseldorf/wohnung-mieten?enteredFrom=one_step_search",
"enabled": true
},
"immoswp": {
"url": "https://immo.swp.de/suchergebnisse?l=M%C3%BCnchen&r=0km&_multiselect_r=0km&ut=private&t=apartment%3Arental&a=de.muenchen&pf=&pt=&rf=0&rt=0&sf=50&st=&yf=&yt=&ff=&ft=&s=most_recently_updated_first&pa=&o=&ad=&u=",
"enabled": true
},
"kalaydo": {
"url": "https://www.kalaydo.de/immobilien/eigentumswohnung-kaufen/o/duesseldorf/4/?attr_gt_estate_size_living_area=90.0&attr_gt_no_of_rooms=3.5&maxPrice=420000.00&radius=5&resultsPerPage=50&sorting=-date",
"enabled": true

View File

@@ -12,10 +12,6 @@ const emptyTable = () => {
);
};
const truncate = (str, n) => {
return str.length > n ? str.substr(0, n - 1) + '…' : str;
};
const content = (providerData, onRemove) => {
return (
<Fragment>
@@ -23,7 +19,11 @@ const content = (providerData, onRemove) => {
return (
<Table.Row key={data.id}>
<Table.Cell>{data.name}</Table.Cell>
<Table.Cell>{truncate(data.url, 60)}</Table.Cell>
<Table.Cell>
<a href={data.url} target="_blank" rel="noopener noreferrer">
Visit site
</a>
</Table.Cell>
<Table.Cell>
<div style={{ float: 'right' }}>
<Button circular color="red" icon="trash" onClick={() => onRemove(data.id)} />

View File

@@ -1,25 +1,46 @@
import React from 'react';
import { format } from '../../services/time/timeService';
import { Label } from 'semantic-ui-react';
import { Header, Label, Message, Segment } from 'semantic-ui-react';
export default function ProcessingTimes({ processingTimes }) {
return (
<React.Fragment>
<Label as="span" color="black">
Processing Interval:
<Label.Detail>{processingTimes.interval} min</Label.Detail>
</Label>
{processingTimes.lastRun && (
<React.Fragment>
<Label as="span" color="black">
Last run:
<Label.Detail>{format(processingTimes.lastRun)}</Label.Detail>
</Label>
<Label as="span" color="black">
Next run:
<Label.Detail>{format(processingTimes.lastRun + processingTimes.interval * 60000)}</Label.Detail>
</Label>
</React.Fragment>
<div>
<Label as="span" color="black">
Processing Interval:
<Label.Detail>{processingTimes.interval} min</Label.Detail>
</Label>
{processingTimes.lastRun && (
<React.Fragment>
<Label as="span" color="black">
Last run:
<Label.Detail>{format(processingTimes.lastRun)}</Label.Detail>
</Label>
<Label as="span" color="black">
Next run:
<Label.Detail>{format(processingTimes.lastRun + processingTimes.interval * 60000)}</Label.Detail>
</Label>
</React.Fragment>
)}
</div>
{processingTimes.scrapingAntData != null && (
<Segment inverted>
<Header as="h5">Remaining ScrapingAnt calls</Header>
<Message.List>
<Message.Item>Plan: {processingTimes.scrapingAntData.plan_name}</Message.Item>
<Message.Item>
Duration: {format(new Date(processingTimes.scrapingAntData.start_date))} -{' '}
{format(new Date(processingTimes.scrapingAntData.end_date))}
</Message.Item>
<Message.Item>
Credits: {processingTimes.scrapingAntData.remained_credits}/
{processingTimes.scrapingAntData.plan_total_credits} (250 credits per call)
</Message.Item>
</Message.List>
If you want to scrape Immoscout more often, you have to purchase a premium account of ScrapingAnt. You can use
the code <b>FREDY10</b> to get 10% off. (No affiliation, we are <b>not</b> getting paid to recommend
ScrapingAnt.
</Segment>
)}
</React.Fragment>
);

2387
yarn.lock

File diff suppressed because it is too large Load Diff