docs: add comprehensive documentation for API, backups, canary token, dashboard, honeypot, reverse proxy, and wordlist customization
This commit is contained in:
187
README.md
187
README.md
@@ -33,19 +33,6 @@
|
||||
<img src="https://img.shields.io/badge/helm-chart-0F1689?logo=helm&logoColor=white" alt="Helm Chart">
|
||||
</a>
|
||||
</div>
|
||||
|
||||
<br>
|
||||
|
||||
<p align="center">
|
||||
<a href="#what-is-krawl">What is Krawl?</a> •
|
||||
<a href="#-installation">Installation</a> •
|
||||
<a href="#honeypot-pages">Honeypot Pages</a> •
|
||||
<a href="#dashboard">Dashboard</a> •
|
||||
<a href="./ToDo.md">Todo</a> •
|
||||
<a href="#-contributing">Contributing</a>
|
||||
</p>
|
||||
|
||||
<br>
|
||||
</div>
|
||||
|
||||
## Table of Contents
|
||||
@@ -62,15 +49,7 @@
|
||||
- [Ban Malicious IPs](#use-krawl-to-ban-malicious-ips)
|
||||
- [IP Reputation](#ip-reputation)
|
||||
- [Forward Server Header](#forward-server-header)
|
||||
- [API](#api)
|
||||
- [Honeypot](#honeypot)
|
||||
- [robots.txt](#robotstxt)
|
||||
- [Honeypot Pages](#honeypot-pages)
|
||||
- [Reverse Proxy Usage](#example-usage-behind-reverse-proxy)
|
||||
- [Database Backups](#enable-database-dump-job-for-backups)
|
||||
- [Canary Token](#customizing-the-canary-token)
|
||||
- [Customizing the Wordlist](#customizing-the-wordlist)
|
||||
- [Dashboard](#dashboard)
|
||||
- [Additional Documentation](#additional-documentation)
|
||||
- [Contributing](#-contributing)
|
||||
|
||||
## Demo
|
||||
@@ -92,7 +71,7 @@ It features:
|
||||
- **Fake Login Pages**: WordPress, phpMyAdmin, admin panels
|
||||
- **Honeypot Paths**: Advertised in robots.txt to catch scanners
|
||||
- **Fake Credentials**: Realistic-looking usernames, passwords, API keys
|
||||
- **[Canary Token](#customizing-the-canary-token) Integration**: External alert triggering
|
||||
- **[Canary Token](docs/canary-token.md) Integration**: External alert triggering
|
||||
- **Random server headers**: Confuse attacks based on server header and version
|
||||
- **Real-time Dashboard**: Monitor suspicious activity
|
||||
- **Customizable Wordlists**: Easy JSON-based configuration
|
||||
@@ -289,159 +268,17 @@ location / {
|
||||
}
|
||||
```
|
||||
|
||||
## API
|
||||
Krawl uses the following APIs
|
||||
- http://ip-api.com (IP Data)
|
||||
- https://iprep.lcrawl.com (IP Reputation)
|
||||
- https://nominatim.openstreetmap.org/reverse (Reverse IP Lookup)
|
||||
- https://api.ipify.org (Public IP discovery)
|
||||
- http://ident.me (Public IP discovery)
|
||||
- https://ifconfig.me (Public IP discovery)
|
||||
## Additional Documentation
|
||||
|
||||
# Honeypot
|
||||
Below is a complete overview of the Krawl honeypot’s capabilities
|
||||
|
||||
## robots.txt
|
||||
The actual (juicy) robots.txt configuration [is the following](src/templates/html/robots.txt).
|
||||
|
||||
## Honeypot pages
|
||||
|
||||
### Common Login Attempts
|
||||
Requests to common admin endpoints (`/admin/`, `/wp-admin/`, `/phpMyAdmin/`) return a fake login page. Any login attempt triggers a 1-second delay to simulate real processing and is fully logged in the dashboard (credentials, IP, headers, timing).
|
||||
|
||||

|
||||
|
||||
### Common Misconfiguration Paths
|
||||
Requests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/uploads/` return a fake directory listing populated with “interesting” files, each assigned a random file size to look realistic.
|
||||
|
||||

|
||||
|
||||
### Environment File Leakage
|
||||
The `.env` endpoint exposes fake database connection strings, **AWS API keys**, and **Stripe secrets**. It intentionally returns an error due to the `Content-Type` being `application/json` instead of plain text, mimicking a "juicy" misconfiguration that crawlers and scanners often flag as information leakage.
|
||||
|
||||
### Server Error Information
|
||||
The `/server` page displays randomly generated fake error information for each known server.
|
||||
|
||||

|
||||
|
||||
### API Endpoints with Sensitive Data
|
||||
The pages `/api/v1/users` and `/api/v2/secrets` show fake users and random secrets in JSON format
|
||||
|
||||

|
||||
|
||||
### Exposed Credential Files
|
||||
The pages `/credentials.txt` and `/passwords.txt` show fake users and random secrets
|
||||
|
||||

|
||||
|
||||
### SQL Injection and XSS Detection
|
||||
Pages such as `/users`, `/search`, `/contact`, `/info`, `/input`, and `/feedback`, along with APIs like `/api/sql` and `/api/database`, are designed to lure attackers into performing attacks such as **SQL injection** or **XSS**.
|
||||
|
||||

|
||||
|
||||
Automated tools like **SQLMap** will receive a different randomized database error on each request, increasing scan noise and confusing the attacker. All detected attacks are logged and displayed in the dashboard.
|
||||
|
||||
### Path Traversal Detection
|
||||
Krawl detects and responds to **path traversal** attempts targeting common system files like `/etc/passwd`, `/etc/shadow`, or Windows system paths. When an attacker tries to access sensitive files using patterns like `../../../etc/passwd` or encoded variants (`%2e%2e/`, `%252e`), Krawl returns convincing fake file contents with realistic system users, UIDs, GIDs, and shell configurations. This wastes attacker time while logging the full attack pattern.
|
||||
|
||||
### XXE (XML External Entity) Injection
|
||||
The `/api/xml` and `/api/parser` endpoints accept XML input and are designed to detect **XXE injection** attempts. When attackers try to exploit external entity declarations (`<!ENTITY`, `<!DOCTYPE`, `SYSTEM`) or reference entities to access local files, Krawl responds with realistic XML responses that appear to process the entities successfully. The honeypot returns fake file contents, simulated entity values (like `admin_credentials` or `database_connection`), or realistic error messages, making the attack appear successful while fully logging the payload.
|
||||
|
||||
### Command Injection Detection
|
||||
Pages like `/api/exec`, `/api/run`, and `/api/system` simulate command execution endpoints vulnerable to **command injection**. When attackers attempt to inject shell commands using patterns like `; whoami`, `| cat /etc/passwd`, or backticks, Krawl responds with realistic command outputs. For example, `whoami` returns fake usernames like `www-data` or `nginx`, while `uname` returns fake Linux kernel versions. Network commands like `wget` or `curl` simulate downloads or return "command not found" errors, creating believable responses that delay and confuse automated exploitation tools.
|
||||
## Example usage behind reverse proxy
|
||||
|
||||
You can configure a reverse proxy so all web requests land on the Krawl page by default, and hide your real content behind a secret hidden url. For example:
|
||||
|
||||
```bash
|
||||
location / {
|
||||
proxy_pass https://your-krawl-instance;
|
||||
proxy_pass_header Server;
|
||||
}
|
||||
|
||||
location /my-hidden-service {
|
||||
proxy_pass https://my-hidden-service;
|
||||
proxy_pass_header Server;
|
||||
}
|
||||
```
|
||||
|
||||
Alternatively, you can create a bunch of different "interesting" looking domains. For example:
|
||||
|
||||
- admin.example.com
|
||||
- portal.example.com
|
||||
- sso.example.com
|
||||
- login.example.com
|
||||
- ...
|
||||
|
||||
Additionally, you may configure your reverse proxy to forward all non-existing subdomains (e.g. nonexistent.example.com) to one of these domains so that any crawlers that are guessing domains at random will automatically end up at your Krawl instance.
|
||||
|
||||
## Enable database dump job for backups
|
||||
|
||||
To enable the database dump job, set the following variables (*config file example*)
|
||||
|
||||
```yaml
|
||||
backups:
|
||||
path: "backups" # where backup will be saved
|
||||
cron: "*/30 * * * *" # frequency of the cronjob
|
||||
enabled: true
|
||||
```
|
||||
|
||||
|
||||
## Customizing the Canary Token
|
||||
|
||||
To create a custom canary token, visit https://canarytokens.org
|
||||
|
||||
and generate a “Web bug” canary token.
|
||||
|
||||
This optional token is triggered when a crawler fully traverses the webpage until it reaches 0. At that point, a URL is returned. When this URL is requested, it sends an alert to the user via email, including the visitor’s IP address and user agent.
|
||||
|
||||
|
||||
To enable this feature, set the canary token URL [using the environment variable](#configuration-via-environment-variables) `KRAWL_CANARY_TOKEN_URL`.
|
||||
|
||||
## Customizing the wordlist
|
||||
|
||||
Edit `wordlists.json` to customize fake data for your use case
|
||||
|
||||
```json
|
||||
{
|
||||
"usernames": {
|
||||
"prefixes": ["admin", "root", "user"],
|
||||
"suffixes": ["_prod", "_dev", "123"]
|
||||
},
|
||||
"passwords": {
|
||||
"prefixes": ["P@ssw0rd", "Admin"],
|
||||
"simple": ["test", "password"]
|
||||
},
|
||||
"directory_listing": {
|
||||
"files": ["credentials.txt", "backup.sql"],
|
||||
"directories": ["admin/", "backup/"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
or **values.yaml** in the case of helm chart installation
|
||||
|
||||
## Dashboard
|
||||
|
||||
Access the dashboard at `http://<server-ip>:<port>/<dashboard-path>`
|
||||
|
||||
The dashboard shows:
|
||||
- Total and unique accesses
|
||||
- Suspicious activity and attack detection
|
||||
- Top IPs, paths, user-agents and GeoIP localization
|
||||
- Real-time monitoring
|
||||
|
||||
The attackers’ access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged.
|
||||
|
||||
Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website.
|
||||
|
||||

|
||||
|
||||
The top IP Addresses is shown along with top paths and User Agents
|
||||
|
||||

|
||||
|
||||

|
||||
| Topic | Description |
|
||||
|-------|-------------|
|
||||
| [API](docs/api.md) | External APIs used by Krawl for IP data, reputation, and geolocation |
|
||||
| [Honeypot](docs/honeypot.md) | Full overview of honeypot pages: fake logins, directory listings, credential files, SQLi/XSS/XXE/command injection traps, and more |
|
||||
| [Reverse Proxy](docs/reverse-proxy.md) | How to deploy Krawl behind NGINX or use decoy subdomains |
|
||||
| [Database Backups](docs/backups.md) | Enable and configure the automatic database dump job |
|
||||
| [Canary Token](docs/canary-token.md) | Set up external alert triggers via canarytokens.org |
|
||||
| [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings |
|
||||
| [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard |
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
|
||||
Reference in New Issue
Block a user