diff --git a/README.md b/README.md index 720b906..1f67b21 100644 --- a/README.md +++ b/README.md @@ -33,19 +33,6 @@ Helm Chart - -
- -

- What is Krawl? • - Installation • - Honeypot Pages • - Dashboard • - Todo • - Contributing -

- -
## Table of Contents @@ -62,15 +49,7 @@ - [Ban Malicious IPs](#use-krawl-to-ban-malicious-ips) - [IP Reputation](#ip-reputation) - [Forward Server Header](#forward-server-header) -- [API](#api) -- [Honeypot](#honeypot) - - [robots.txt](#robotstxt) - - [Honeypot Pages](#honeypot-pages) -- [Reverse Proxy Usage](#example-usage-behind-reverse-proxy) -- [Database Backups](#enable-database-dump-job-for-backups) -- [Canary Token](#customizing-the-canary-token) -- [Customizing the Wordlist](#customizing-the-wordlist) -- [Dashboard](#dashboard) +- [Additional Documentation](#additional-documentation) - [Contributing](#-contributing) ## Demo @@ -92,7 +71,7 @@ It features: - **Fake Login Pages**: WordPress, phpMyAdmin, admin panels - **Honeypot Paths**: Advertised in robots.txt to catch scanners - **Fake Credentials**: Realistic-looking usernames, passwords, API keys -- **[Canary Token](#customizing-the-canary-token) Integration**: External alert triggering +- **[Canary Token](docs/canary-token.md) Integration**: External alert triggering - **Random server headers**: Confuse attacks based on server header and version - **Real-time Dashboard**: Monitor suspicious activity - **Customizable Wordlists**: Easy JSON-based configuration @@ -289,159 +268,17 @@ location / { } ``` -## API -Krawl uses the following APIs -- http://ip-api.com (IP Data) -- https://iprep.lcrawl.com (IP Reputation) -- https://nominatim.openstreetmap.org/reverse (Reverse IP Lookup) -- https://api.ipify.org (Public IP discovery) -- http://ident.me (Public IP discovery) -- https://ifconfig.me (Public IP discovery) +## Additional Documentation -# Honeypot -Below is a complete overview of the Krawl honeypot’s capabilities - -## robots.txt -The actual (juicy) robots.txt configuration [is the following](src/templates/html/robots.txt). - -## Honeypot pages - -### Common Login Attempts -Requests to common admin endpoints (`/admin/`, `/wp-admin/`, `/phpMyAdmin/`) return a fake login page. Any login attempt triggers a 1-second delay to simulate real processing and is fully logged in the dashboard (credentials, IP, headers, timing). - -![admin page](img/admin-page.png) - -### Common Misconfiguration Paths -Requests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/uploads/` return a fake directory listing populated with “interesting” files, each assigned a random file size to look realistic. - -![directory-page](img/directory-page.png) - -### Environment File Leakage -The `.env` endpoint exposes fake database connection strings, **AWS API keys**, and **Stripe secrets**. It intentionally returns an error due to the `Content-Type` being `application/json` instead of plain text, mimicking a "juicy" misconfiguration that crawlers and scanners often flag as information leakage. - -### Server Error Information -The `/server` page displays randomly generated fake error information for each known server. - -![server and env page](img/server-and-env-page.png) - -### API Endpoints with Sensitive Data -The pages `/api/v1/users` and `/api/v2/secrets` show fake users and random secrets in JSON format - -![users and secrets](img/users-and-secrets.png) - -### Exposed Credential Files -The pages `/credentials.txt` and `/passwords.txt` show fake users and random secrets - -![credentials and passwords](img/credentials-and-passwords.png) - -### SQL Injection and XSS Detection -Pages such as `/users`, `/search`, `/contact`, `/info`, `/input`, and `/feedback`, along with APIs like `/api/sql` and `/api/database`, are designed to lure attackers into performing attacks such as **SQL injection** or **XSS**. - -![sql injection](img/sql_injection.png) - -Automated tools like **SQLMap** will receive a different randomized database error on each request, increasing scan noise and confusing the attacker. All detected attacks are logged and displayed in the dashboard. - -### Path Traversal Detection -Krawl detects and responds to **path traversal** attempts targeting common system files like `/etc/passwd`, `/etc/shadow`, or Windows system paths. When an attacker tries to access sensitive files using patterns like `../../../etc/passwd` or encoded variants (`%2e%2e/`, `%252e`), Krawl returns convincing fake file contents with realistic system users, UIDs, GIDs, and shell configurations. This wastes attacker time while logging the full attack pattern. - -### XXE (XML External Entity) Injection -The `/api/xml` and `/api/parser` endpoints accept XML input and are designed to detect **XXE injection** attempts. When attackers try to exploit external entity declarations (`:/` - -The dashboard shows: -- Total and unique accesses -- Suspicious activity and attack detection -- Top IPs, paths, user-agents and GeoIP localization -- Real-time monitoring - -The attackers’ access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged. - -Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website. - -![dashboard-1](img/dashboard-1.png) - -The top IP Addresses is shown along with top paths and User Agents - -![dashboard-2](img/dashboard-2.png) - -![dashboard-3](img/dashboard-3.png) +| Topic | Description | +|-------|-------------| +| [API](docs/api.md) | External APIs used by Krawl for IP data, reputation, and geolocation | +| [Honeypot](docs/honeypot.md) | Full overview of honeypot pages: fake logins, directory listings, credential files, SQLi/XSS/XXE/command injection traps, and more | +| [Reverse Proxy](docs/reverse-proxy.md) | How to deploy Krawl behind NGINX or use decoy subdomains | +| [Database Backups](docs/backups.md) | Enable and configure the automatic database dump job | +| [Canary Token](docs/canary-token.md) | Set up external alert triggers via canarytokens.org | +| [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings | +| [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard | ## 🤝 Contributing diff --git a/docs/api.md b/docs/api.md new file mode 100644 index 0000000..8d4ab18 --- /dev/null +++ b/docs/api.md @@ -0,0 +1,9 @@ +# API + +Krawl uses the following APIs +- http://ip-api.com (IP Data) +- https://iprep.lcrawl.com (IP Reputation) +- https://nominatim.openstreetmap.org/reverse (Reverse IP Lookup) +- https://api.ipify.org (Public IP discovery) +- http://ident.me (Public IP discovery) +- https://ifconfig.me (Public IP discovery) diff --git a/docs/backups.md b/docs/backups.md new file mode 100644 index 0000000..84bf5db --- /dev/null +++ b/docs/backups.md @@ -0,0 +1,10 @@ +# Enable Database Dump Job for Backups + +To enable the database dump job, set the following variables (*config file example*) + +```yaml +backups: + path: "backups" # where backup will be saved + cron: "*/30 * * * *" # frequency of the cronjob + enabled: true +``` diff --git a/docs/canary-token.md b/docs/canary-token.md new file mode 100644 index 0000000..6e6c314 --- /dev/null +++ b/docs/canary-token.md @@ -0,0 +1,10 @@ +# Customizing the Canary Token + +To create a custom canary token, visit https://canarytokens.org + +and generate a "Web bug" canary token. + +This optional token is triggered when a crawler fully traverses the webpage until it reaches 0. At that point, a URL is returned. When this URL is requested, it sends an alert to the user via email, including the visitor's IP address and user agent. + + +To enable this feature, set the canary token URL [using the environment variable](../README.md#configuration-via-enviromental-variables) `KRAWL_CANARY_TOKEN_URL`. diff --git a/docs/dashboard.md b/docs/dashboard.md new file mode 100644 index 0000000..ace7955 --- /dev/null +++ b/docs/dashboard.md @@ -0,0 +1,21 @@ +# Dashboard + +Access the dashboard at `http://:/` + +The dashboard shows: +- Total and unique accesses +- Suspicious activity and attack detection +- Top IPs, paths, user-agents and GeoIP localization +- Real-time monitoring + +The attackers' access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged. + +Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website. + +![dashboard-1](../img/dashboard-1.png) + +The top IP Addresses is shown along with top paths and User Agents + +![dashboard-2](../img/dashboard-2.png) + +![dashboard-3](../img/dashboard-3.png) diff --git a/docs/honeypot.md b/docs/honeypot.md new file mode 100644 index 0000000..6baffab --- /dev/null +++ b/docs/honeypot.md @@ -0,0 +1,52 @@ +# Honeypot + +Below is a complete overview of the Krawl honeypot's capabilities + +## robots.txt +The actual (juicy) robots.txt configuration [is the following](../src/templates/html/robots.txt). + +## Honeypot pages + +### Common Login Attempts +Requests to common admin endpoints (`/admin/`, `/wp-admin/`, `/phpMyAdmin/`) return a fake login page. Any login attempt triggers a 1-second delay to simulate real processing and is fully logged in the dashboard (credentials, IP, headers, timing). + +![admin page](../img/admin-page.png) + +### Common Misconfiguration Paths +Requests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/uploads/` return a fake directory listing populated with "interesting" files, each assigned a random file size to look realistic. + +![directory-page](../img/directory-page.png) + +### Environment File Leakage +The `.env` endpoint exposes fake database connection strings, **AWS API keys**, and **Stripe secrets**. It intentionally returns an error due to the `Content-Type` being `application/json` instead of plain text, mimicking a "juicy" misconfiguration that crawlers and scanners often flag as information leakage. + +### Server Error Information +The `/server` page displays randomly generated fake error information for each known server. + +![server and env page](../img/server-and-env-page.png) + +### API Endpoints with Sensitive Data +The pages `/api/v1/users` and `/api/v2/secrets` show fake users and random secrets in JSON format + +![users and secrets](../img/users-and-secrets.png) + +### Exposed Credential Files +The pages `/credentials.txt` and `/passwords.txt` show fake users and random secrets + +![credentials and passwords](../img/credentials-and-passwords.png) + +### SQL Injection and XSS Detection +Pages such as `/users`, `/search`, `/contact`, `/info`, `/input`, and `/feedback`, along with APIs like `/api/sql` and `/api/database`, are designed to lure attackers into performing attacks such as **SQL injection** or **XSS**. + +![sql injection](../img/sql_injection.png) + +Automated tools like **SQLMap** will receive a different randomized database error on each request, increasing scan noise and confusing the attacker. All detected attacks are logged and displayed in the dashboard. + +### Path Traversal Detection +Krawl detects and responds to **path traversal** attempts targeting common system files like `/etc/passwd`, `/etc/shadow`, or Windows system paths. When an attacker tries to access sensitive files using patterns like `../../../etc/passwd` or encoded variants (`%2e%2e/`, `%252e`), Krawl returns convincing fake file contents with realistic system users, UIDs, GIDs, and shell configurations. This wastes attacker time while logging the full attack pattern. + +### XXE (XML External Entity) Injection +The `/api/xml` and `/api/parser` endpoints accept XML input and are designed to detect **XXE injection** attempts. When attackers try to exploit external entity declarations (`