docs: add comprehensive documentation for API, backups, canary token, dashboard, honeypot, reverse proxy, and wordlist customization

2026-03-01 21:20:33 +01:00
parent b6fbcabdee
commit 2e4e494636
8 changed files with 161 additions and 175 deletions
--- a/README.md
+++ b/README.md
@@ -33,19 +33,6 @@
    <img src="https://img.shields.io/badge/helm-chart-0F1689?logo=helm&logoColor=white" alt="Helm Chart">
  </a>
 </div>
-
-<br>
-
-<p align="center">
-  <a href="#what-is-krawl">What is Krawl?</a> •
-  <a href="#-installation">Installation</a> •
-  <a href="#honeypot-pages">Honeypot Pages</a> •
-  <a href="#dashboard">Dashboard</a> •
-  <a href="./ToDo.md">Todo</a> •
-  <a href="#-contributing">Contributing</a>
-</p>
-
-<br>
 </div>

 ## Table of Contents
@@ -62,15 +49,7 @@
 - [Ban Malicious IPs](#use-krawl-to-ban-malicious-ips)
 - [IP Reputation](#ip-reputation)
 - [Forward Server Header](#forward-server-header)
- [API](#api)
- [Honeypot](#honeypot)
-  - [robots.txt](#robotstxt)
-  - [Honeypot Pages](#honeypot-pages)
- [Reverse Proxy Usage](#example-usage-behind-reverse-proxy)
- [Database Backups](#enable-database-dump-job-for-backups)
- [Canary Token](#customizing-the-canary-token)
- [Customizing the Wordlist](#customizing-the-wordlist)
- [Dashboard](#dashboard)
+- [Additional Documentation](#additional-documentation)
 - [Contributing](#-contributing)

 ## Demo
@@ -92,7 +71,7 @@ It features:
 - **Fake Login Pages**: WordPress, phpMyAdmin, admin panels
 - **Honeypot Paths**: Advertised in robots.txt to catch scanners
 - **Fake Credentials**: Realistic-looking usernames, passwords, API keys
- **[Canary Token](#customizing-the-canary-token) Integration**: External alert triggering
+- **[Canary Token](docs/canary-token.md) Integration**: External alert triggering
 - **Random server headers**: Confuse attacks based on server header and version
 - **Real-time Dashboard**: Monitor suspicious activity
 - **Customizable Wordlists**: Easy JSON-based configuration
@@ -289,159 +268,17 @@ location / {
 }
 ```

-## API
-Krawl uses the following APIs
- http://ip-api.com (IP Data)
- https://iprep.lcrawl.com (IP Reputation)
- https://nominatim.openstreetmap.org/reverse (Reverse IP Lookup)
- https://api.ipify.org (Public IP discovery)
- http://ident.me (Public IP discovery)
- https://ifconfig.me (Public IP discovery)
+## Additional Documentation

-# Honeypot
-Below is a complete overview of the Krawl honeypot’s capabilities
-
-## robots.txt
-The actual (juicy) robots.txt configuration [is the following](src/templates/html/robots.txt).
-
-## Honeypot pages
-
-### Common Login Attempts
-Requests to common admin endpoints (`/admin/`, `/wp-admin/`, `/phpMyAdmin/`) return a fake login page. Any login attempt triggers a 1-second delay to simulate real processing and is fully logged in the dashboard (credentials, IP, headers, timing).
-
-![admin page](img/admin-page.png)
-
-### Common Misconfiguration Paths
-Requests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/uploads/` return a fake directory listing populated with “interesting” files, each assigned a random file size to look realistic.
-
-![directory-page](img/directory-page.png)
-
-### Environment File Leakage
-The `.env` endpoint exposes fake database connection strings, **AWS API keys**, and **Stripe secrets**. It intentionally returns an error due to the `Content-Type` being `application/json` instead of plain text, mimicking a "juicy" misconfiguration that crawlers and scanners often flag as information leakage.
-
-### Server Error Information
-The `/server` page displays randomly generated fake error information for each known server.
-
-![server and env page](img/server-and-env-page.png)
-
-### API Endpoints with Sensitive Data
-The pages `/api/v1/users` and `/api/v2/secrets` show fake users and random secrets in JSON format
-
-![users and secrets](img/users-and-secrets.png)
-
-### Exposed Credential Files
-The pages `/credentials.txt` and `/passwords.txt` show fake users and random secrets
-
-![credentials and passwords](img/credentials-and-passwords.png)
-
-### SQL Injection and XSS Detection
-Pages such as `/users`, `/search`, `/contact`, `/info`, `/input`, and `/feedback`, along with APIs like `/api/sql` and `/api/database`, are designed to lure attackers into performing attacks such as **SQL injection** or **XSS**.
-
-![sql injection](img/sql_injection.png)
-
-Automated tools like **SQLMap** will receive a different randomized database error on each request, increasing scan noise and confusing the attacker. All detected attacks are logged and displayed in the dashboard.
-
-### Path Traversal Detection
-Krawl detects and responds to **path traversal** attempts targeting common system files like `/etc/passwd`, `/etc/shadow`, or Windows system paths. When an attacker tries to access sensitive files using patterns like `../../../etc/passwd` or encoded variants (`%2e%2e/`, `%252e`), Krawl returns convincing fake file contents with realistic system users, UIDs, GIDs, and shell configurations. This wastes attacker time while logging the full attack pattern.
-
-### XXE (XML External Entity) Injection
-The `/api/xml` and `/api/parser` endpoints accept XML input and are designed to detect **XXE injection** attempts. When attackers try to exploit external entity declarations (`<!ENTITY`, `<!DOCTYPE`, `SYSTEM`) or reference entities to access local files, Krawl responds with realistic XML responses that appear to process the entities successfully. The honeypot returns fake file contents, simulated entity values (like `admin_credentials` or `database_connection`), or realistic error messages, making the attack appear successful while fully logging the payload.
-
-### Command Injection Detection
-Pages like `/api/exec`, `/api/run`, and `/api/system` simulate command execution endpoints vulnerable to **command injection**. When attackers attempt to inject shell commands using patterns like `; whoami`, `| cat /etc/passwd`, or backticks, Krawl responds with realistic command outputs. For example, `whoami` returns fake usernames like `www-data` or `nginx`, while `uname` returns fake Linux kernel versions. Network commands like `wget` or `curl` simulate downloads or return "command not found" errors, creating believable responses that delay and confuse automated exploitation tools.
-## Example usage behind reverse proxy
-
-You can configure a reverse proxy so all web requests land on the Krawl page by default, and hide your real content behind a secret hidden url. For example:
-
-```bash
-location / {
-    proxy_pass https://your-krawl-instance;
-    proxy_pass_header Server;
-}
-
-location /my-hidden-service {
-    proxy_pass https://my-hidden-service;
-    proxy_pass_header Server;
-}
-```
-
-Alternatively, you can create a bunch of different "interesting" looking domains. For example:
-
- admin.example.com
- portal.example.com
- sso.example.com
- login.example.com
- ...
-
-Additionally, you may configure your reverse proxy to forward all non-existing subdomains (e.g. nonexistent.example.com) to one of these domains so that any crawlers that are guessing domains at random will automatically end up at your Krawl instance.
-
-## Enable database dump job for backups
-
-To enable the database dump job, set the following variables (*config file example*)
-
-```yaml
-backups:
-    path: "backups" # where backup will be saved
-    cron: "*/30 * * * *" # frequency of the cronjob
-    enabled: true
-```
-
-
-## Customizing the Canary Token
-
-To create a custom canary token, visit https://canarytokens.org
-
-and generate a “Web bug” canary token.
-
-This optional token is triggered when a crawler fully traverses the webpage until it reaches 0. At that point, a URL is returned. When this URL is requested, it sends an alert to the user via email, including the visitor’s IP address and user agent.
-
-
-To enable this feature, set the canary token URL [using the environment variable](#configuration-via-environment-variables) `KRAWL_CANARY_TOKEN_URL`.
-
-## Customizing the wordlist
-
-Edit `wordlists.json` to customize fake data for your use case
-
-```json
-{
-  "usernames": {
-    "prefixes": ["admin", "root", "user"],
-    "suffixes": ["_prod", "_dev", "123"]
-  },
-  "passwords": {
-    "prefixes": ["P@ssw0rd", "Admin"],
-    "simple": ["test", "password"]
-  },
-  "directory_listing": {
-    "files": ["credentials.txt", "backup.sql"],
-    "directories": ["admin/", "backup/"]
-  }
-}
-```
-
-or **values.yaml** in the case of helm chart installation
-
-## Dashboard
-
-Access the dashboard at `http://<server-ip>:<port>/<dashboard-path>`
-
-The dashboard shows:
- Total and unique accesses
- Suspicious activity and attack detection
- Top IPs, paths, user-agents and GeoIP localization
- Real-time monitoring
-
-The attackers’ access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged.
-
-Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website.
-
-![dashboard-1](img/dashboard-1.png)
-
-The top IP Addresses is shown along with top paths and User Agents
-
-![dashboard-2](img/dashboard-2.png)
-
-![dashboard-3](img/dashboard-3.png)
+| Topic | Description |
+|-------|-------------|
+| [API](docs/api.md) | External APIs used by Krawl for IP data, reputation, and geolocation |
+| [Honeypot](docs/honeypot.md) | Full overview of honeypot pages: fake logins, directory listings, credential files, SQLi/XSS/XXE/command injection traps, and more |
+| [Reverse Proxy](docs/reverse-proxy.md) | How to deploy Krawl behind NGINX or use decoy subdomains |
+| [Database Backups](docs/backups.md) | Enable and configure the automatic database dump job |
+| [Canary Token](docs/canary-token.md) | Set up external alert triggers via canarytokens.org |
+| [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings |
+| [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard |

 ## 🤝 Contributing