docs: add comprehensive documentation for API, backups, canary token, dashboard, honeypot, reverse proxy, and wordlist customization
This commit is contained in:
9
docs/api.md
Normal file
9
docs/api.md
Normal file
@@ -0,0 +1,9 @@
|
||||
# API
|
||||
|
||||
Krawl uses the following APIs
|
||||
- http://ip-api.com (IP Data)
|
||||
- https://iprep.lcrawl.com (IP Reputation)
|
||||
- https://nominatim.openstreetmap.org/reverse (Reverse IP Lookup)
|
||||
- https://api.ipify.org (Public IP discovery)
|
||||
- http://ident.me (Public IP discovery)
|
||||
- https://ifconfig.me (Public IP discovery)
|
||||
10
docs/backups.md
Normal file
10
docs/backups.md
Normal file
@@ -0,0 +1,10 @@
|
||||
# Enable Database Dump Job for Backups
|
||||
|
||||
To enable the database dump job, set the following variables (*config file example*)
|
||||
|
||||
```yaml
|
||||
backups:
|
||||
path: "backups" # where backup will be saved
|
||||
cron: "*/30 * * * *" # frequency of the cronjob
|
||||
enabled: true
|
||||
```
|
||||
10
docs/canary-token.md
Normal file
10
docs/canary-token.md
Normal file
@@ -0,0 +1,10 @@
|
||||
# Customizing the Canary Token
|
||||
|
||||
To create a custom canary token, visit https://canarytokens.org
|
||||
|
||||
and generate a "Web bug" canary token.
|
||||
|
||||
This optional token is triggered when a crawler fully traverses the webpage until it reaches 0. At that point, a URL is returned. When this URL is requested, it sends an alert to the user via email, including the visitor's IP address and user agent.
|
||||
|
||||
|
||||
To enable this feature, set the canary token URL [using the environment variable](../README.md#configuration-via-enviromental-variables) `KRAWL_CANARY_TOKEN_URL`.
|
||||
21
docs/dashboard.md
Normal file
21
docs/dashboard.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Dashboard
|
||||
|
||||
Access the dashboard at `http://<server-ip>:<port>/<dashboard-path>`
|
||||
|
||||
The dashboard shows:
|
||||
- Total and unique accesses
|
||||
- Suspicious activity and attack detection
|
||||
- Top IPs, paths, user-agents and GeoIP localization
|
||||
- Real-time monitoring
|
||||
|
||||
The attackers' access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged.
|
||||
|
||||
Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website.
|
||||
|
||||

|
||||
|
||||
The top IP Addresses is shown along with top paths and User Agents
|
||||
|
||||

|
||||
|
||||

|
||||
52
docs/honeypot.md
Normal file
52
docs/honeypot.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# Honeypot
|
||||
|
||||
Below is a complete overview of the Krawl honeypot's capabilities
|
||||
|
||||
## robots.txt
|
||||
The actual (juicy) robots.txt configuration [is the following](../src/templates/html/robots.txt).
|
||||
|
||||
## Honeypot pages
|
||||
|
||||
### Common Login Attempts
|
||||
Requests to common admin endpoints (`/admin/`, `/wp-admin/`, `/phpMyAdmin/`) return a fake login page. Any login attempt triggers a 1-second delay to simulate real processing and is fully logged in the dashboard (credentials, IP, headers, timing).
|
||||
|
||||

|
||||
|
||||
### Common Misconfiguration Paths
|
||||
Requests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/uploads/` return a fake directory listing populated with "interesting" files, each assigned a random file size to look realistic.
|
||||
|
||||

|
||||
|
||||
### Environment File Leakage
|
||||
The `.env` endpoint exposes fake database connection strings, **AWS API keys**, and **Stripe secrets**. It intentionally returns an error due to the `Content-Type` being `application/json` instead of plain text, mimicking a "juicy" misconfiguration that crawlers and scanners often flag as information leakage.
|
||||
|
||||
### Server Error Information
|
||||
The `/server` page displays randomly generated fake error information for each known server.
|
||||
|
||||

|
||||
|
||||
### API Endpoints with Sensitive Data
|
||||
The pages `/api/v1/users` and `/api/v2/secrets` show fake users and random secrets in JSON format
|
||||
|
||||

|
||||
|
||||
### Exposed Credential Files
|
||||
The pages `/credentials.txt` and `/passwords.txt` show fake users and random secrets
|
||||
|
||||

|
||||
|
||||
### SQL Injection and XSS Detection
|
||||
Pages such as `/users`, `/search`, `/contact`, `/info`, `/input`, and `/feedback`, along with APIs like `/api/sql` and `/api/database`, are designed to lure attackers into performing attacks such as **SQL injection** or **XSS**.
|
||||
|
||||

|
||||
|
||||
Automated tools like **SQLMap** will receive a different randomized database error on each request, increasing scan noise and confusing the attacker. All detected attacks are logged and displayed in the dashboard.
|
||||
|
||||
### Path Traversal Detection
|
||||
Krawl detects and responds to **path traversal** attempts targeting common system files like `/etc/passwd`, `/etc/shadow`, or Windows system paths. When an attacker tries to access sensitive files using patterns like `../../../etc/passwd` or encoded variants (`%2e%2e/`, `%252e`), Krawl returns convincing fake file contents with realistic system users, UIDs, GIDs, and shell configurations. This wastes attacker time while logging the full attack pattern.
|
||||
|
||||
### XXE (XML External Entity) Injection
|
||||
The `/api/xml` and `/api/parser` endpoints accept XML input and are designed to detect **XXE injection** attempts. When attackers try to exploit external entity declarations (`<!ENTITY`, `<!DOCTYPE`, `SYSTEM`) or reference entities to access local files, Krawl responds with realistic XML responses that appear to process the entities successfully. The honeypot returns fake file contents, simulated entity values (like `admin_credentials` or `database_connection`), or realistic error messages, making the attack appear successful while fully logging the payload.
|
||||
|
||||
### Command Injection Detection
|
||||
Pages like `/api/exec`, `/api/run`, and `/api/system` simulate command execution endpoints vulnerable to **command injection**. When attackers attempt to inject shell commands using patterns like `; whoami`, `| cat /etc/passwd`, or backticks, Krawl responds with realistic command outputs. For example, `whoami` returns fake usernames like `www-data` or `nginx`, while `uname` returns fake Linux kernel versions. Network commands like `wget` or `curl` simulate downloads or return "command not found" errors, creating believable responses that delay and confuse automated exploitation tools.
|
||||
25
docs/reverse-proxy.md
Normal file
25
docs/reverse-proxy.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# Example Usage Behind Reverse Proxy
|
||||
|
||||
You can configure a reverse proxy so all web requests land on the Krawl page by default, and hide your real content behind a secret hidden url. For example:
|
||||
|
||||
```bash
|
||||
location / {
|
||||
proxy_pass https://your-krawl-instance;
|
||||
proxy_pass_header Server;
|
||||
}
|
||||
|
||||
location /my-hidden-service {
|
||||
proxy_pass https://my-hidden-service;
|
||||
proxy_pass_header Server;
|
||||
}
|
||||
```
|
||||
|
||||
Alternatively, you can create a bunch of different "interesting" looking domains. For example:
|
||||
|
||||
- admin.example.com
|
||||
- portal.example.com
|
||||
- sso.example.com
|
||||
- login.example.com
|
||||
- ...
|
||||
|
||||
Additionally, you may configure your reverse proxy to forward all non-existing subdomains (e.g. nonexistent.example.com) to one of these domains so that any crawlers that are guessing domains at random will automatically end up at your Krawl instance.
|
||||
22
docs/wordlist.md
Normal file
22
docs/wordlist.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Customizing the Wordlist
|
||||
|
||||
Edit `wordlists.json` to customize fake data for your use case
|
||||
|
||||
```json
|
||||
{
|
||||
"usernames": {
|
||||
"prefixes": ["admin", "root", "user"],
|
||||
"suffixes": ["_prod", "_dev", "123"]
|
||||
},
|
||||
"passwords": {
|
||||
"prefixes": ["P@ssw0rd", "Admin"],
|
||||
"simple": ["test", "password"]
|
||||
},
|
||||
"directory_listing": {
|
||||
"files": ["credentials.txt", "backup.sql"],
|
||||
"directories": ["admin/", "backup/"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
or **values.yaml** in the case of helm chart installation
|
||||
Reference in New Issue
Block a user