- A modern, customizable zero-dependencies honeypot server designed to detect and track malicious activity through deceptive web pages, fake credentials, and canary tokens.
+ A modern, customizable web honeypot server designed to detect and track malicious activity from attackers and web crawlers through deceptive web pages, fake credentials, and canary tokens.
@@ -55,7 +55,7 @@ Tip: crawl the `robots.txt` paths for additional fun
## What is Krawl?
-**Krawl** is a cloud‑native deception server designed to detect, delay, and analyze malicious web crawlers and automated scanners.
+**Krawl** is a cloud‑native deception server designed to detect, delay, and analyze malicious attackers, web crawlers and automated scanners.
It creates realistic fake web applications filled with low‑hanging fruit such as admin panels, configuration files, and exposed fake credentials to attract and identify suspicious activity.
@@ -68,11 +68,14 @@ It features:
- **Honeypot Paths**: Advertised in robots.txt to catch scanners
- **Fake Credentials**: Realistic-looking usernames, passwords, API keys
- **[Canary Token](#customizing-the-canary-token) Integration**: External alert triggering
+- **Random server headers**: Confuse attacks based on server header and version
- **Real-time Dashboard**: Monitor suspicious activity
- **Customizable Wordlists**: Easy JSON-based configuration
- **Random Error Injection**: Mimic real server behavior
-
+
+
+
## 🚀 Installation
@@ -127,149 +130,98 @@ Stop with:
docker-compose down
```
-### Helm Chart
-
-Install with default values:
-
-```bash
-helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \
- --version 2.0.0 \
- --namespace krawl-system \
- --create-namespace
-```
-
-Or create a minimal `values.yaml` file:
-
-```yaml
-service:
- type: LoadBalancer
- port: 5000
-
-ingress:
- enabled: true
- className: "traefik"
- hosts:
- - host: krawl.example.com
- paths:
- - path: /
- pathType: Prefix
-
-config:
- server:
- port: 5000
- delay: 100
- dashboard:
- secret_path: null # Auto-generated if not set
-
-database:
- persistence:
- enabled: true
- size: 1Gi
-```
-
-Install with custom values:
-
-```bash
-helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \
- --version 2.0.0 \
- --namespace krawl-system \
- --create-namespace \
- -f values.yaml
-```
-
-To access the deception server:
-
-```bash
-kubectl get svc krawl -n krawl-system
-```
-
-Once the EXTERNAL-IP is assigned, access your deception server at `http://:5000`
-
### Kubernetes
+**Krawl is also available natively on Kubernetes**. Installation can be done either [via manifest](kubernetes/README.md) or [using the helm chart](helm/README.md).
-Apply all manifests with:
+## Use Krawl to Ban Malicious IPs
+Krawl uses a reputation-based system to classify attacker IP addresses. Every five minutes, Krawl exports the identified malicious IPs to a `malicious_ips.txt` file.
+
+This file can either be mounted from the Docker container into another system or downloaded directly via `curl`:
```bash
-kubectl apply -f https://raw.githubusercontent.com/BlessedRebuS/Krawl/refs/heads/main/kubernetes/krawl-all-in-one-deploy.yaml
+curl https://your-krawl-instance//api/download/malicious_ips.txt
```
-Or clone the repo and apply the manifest:
+This file can be used to [update a set of firewall rules](https://www.allthingstech.ch/using-opnsense-and-ip-blocklists-to-block-malicious-traffic), for example on OPNsense and pfSense, enabling automatic blocking of malicious IPs or using IPtables
+
+## IP Reputation
+Krawl [uses tasks that analyze recent traffic to build and continuously update an IP reputation](src/tasks/analyze_ips.py) score. It runs periodically and evaluates each active IP address based on multiple behavioral indicators to classify it as an attacker, crawler, or regular user. Thresholds are fully customizable.
+
+
+
+The analysis includes:
+- **Risky HTTP methods usage** (e.g. POST, PUT, DELETE ratios)
+- **Robots.txt violations**
+- **Request timing anomalies** (bursty or irregular patterns)
+- **User-Agent consistency**
+- **Attack URL detection** (e.g. SQL injection, XSS patterns)
+
+Each signal contributes to a weighted scoring model that assigns a reputation category:
+- `attacker`
+- `bad_crawler`
+- `good_crawler`
+- `regular_user`
+- `unknown` (for insufficient data)
+
+The resulting scores and metrics are stored in the database and used by Krawl to drive dashboards, reputation tracking, and automated mitigation actions such as IP banning or firewall integration.
+
+## Forward server header
+If Krawl is deployed behind a proxy such as NGINX the **server header** should be forwarded using the following configuration in your proxy:
```bash
-kubectl apply -f kubernetes/krawl-all-in-one-deploy.yaml
+location / {
+ proxy_pass https://your-krawl-instance;
+ proxy_pass_header Server;
+}
```
-Access the deception server:
+## API
+Krawl uses the following APIs
+- https://iprep.lcrawl.com (IP Reputation)
+- https://nominatim.openstreetmap.org/reverse (Reverse IP Lookup)
+- https://api.ipify.org (Public IP discovery)
+- http://ident.me (Public IP discovery)
+- https://ifconfig.me (Public IP discovery)
+
+## Configuration
+Krawl uses a **configuration hierarchy** in which **environment variables take precedence over the configuration file**. This approach is recommended for Docker deployments and quick out-of-the-box customization.
+
+### Configuration via Enviromental Variables
+
+| Environment Variable | Description | Default |
+|----------------------|-------------|---------|
+| `CONFIG_LOCATION` | Path to yaml config file | `config.yaml` |
+| `KRAWL_PORT` | Server listening port | `5000` |
+| `KRAWL_DELAY` | Response delay in milliseconds | `100` |
+| `KRAWL_SERVER_HEADER` | HTTP Server header for deception | `""` |
+| `KRAWL_LINKS_LENGTH_RANGE` | Link length range as `min,max` | `5,15` |
+| `KRAWL_LINKS_PER_PAGE_RANGE` | Links per page as `min,max` | `10,15` |
+| `KRAWL_CHAR_SPACE` | Characters used for link generation | `abcdefgh...` |
+| `KRAWL_MAX_COUNTER` | Initial counter value | `10` |
+| `KRAWL_CANARY_TOKEN_URL` | External canary token URL | None |
+| `KRAWL_CANARY_TOKEN_TRIES` | Requests before showing canary token | `10` |
+| `KRAWL_DASHBOARD_SECRET_PATH` | Custom dashboard path | Auto-generated |
+| `KRAWL_API_SERVER_URL` | API server URL | None |
+| `KRAWL_API_SERVER_PORT` | API server port | `8080` |
+| `KRAWL_API_SERVER_PATH` | API server endpoint path | `/api/v2/users` |
+| `KRAWL_PROBABILITY_ERROR_CODES` | Error response probability (0-100%) | `0` |
+| `KRAWL_DATABASE_PATH` | Database file location | `data/krawl.db` |
+| `KRAWL_DATABASE_RETENTION_DAYS` | Days to retain data in database | `30` |
+| `KRAWL_HTTP_RISKY_METHODS_THRESHOLD` | Threshold for risky HTTP methods detection | `0.1` |
+| `KRAWL_VIOLATED_ROBOTS_THRESHOLD` | Threshold for robots.txt violations | `0.1` |
+| `KRAWL_UNEVEN_REQUEST_TIMING_THRESHOLD` | Coefficient of variation threshold for timing | `0.5` |
+| `KRAWL_UNEVEN_REQUEST_TIMING_TIME_WINDOW_SECONDS` | Time window for request timing analysis in seconds | `300` |
+| `KRAWL_USER_AGENTS_USED_THRESHOLD` | Threshold for detecting multiple user agents | `2` |
+| `KRAWL_ATTACK_URLS_THRESHOLD` | Threshold for attack URL detection | `1` |
+
+For example
```bash
-kubectl get svc krawl-server -n krawl-system
-```
-
-Once the EXTERNAL-IP is assigned, access your deception server at `http://:5000`
-
-### From Source (Python 3.11+)
-
-Clone the repository:
-
-```bash
-git clone https://github.com/blessedrebus/krawl.git
-cd krawl/src
-```
-
-Run the server:
-
-```bash
-python3 server.py
-```
-
-Visit `http://localhost:5000` and access the dashboard at `http://localhost:5000/`
-
-## Configuration via Environment Variables
-
-To customize the deception server installation, environment variables can be specified using the naming convention: `KRAWL_` where `` is the configuration field name in uppercase with special characters converted:
-- `.` → `_`
-- `-` → `__` (double underscore)
-- ` ` (space) → `_`
-
-### Configuration Variables
-
-| Configuration Field | Environment Variable | Description | Default |
-|-----------|-----------|-------------|---------|
-| `port` | `KRAWL_PORT` | Server listening port | `5000` |
-| `delay` | `KRAWL_DELAY` | Response delay in milliseconds | `100` |
-| `server_header` | `KRAWL_SERVER_HEADER` | HTTP Server header for deception | `""` |
-| `links_length_range` | `KRAWL_LINKS_LENGTH_RANGE` | Link length range as `min,max` | `5,15` |
-| `links_per_page_range` | `KRAWL_LINKS_PER_PAGE_RANGE` | Links per page as `min,max` | `10,15` |
-| `char_space` | `KRAWL_CHAR_SPACE` | Characters used for link generation | `abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789` |
-| `max_counter` | `KRAWL_MAX_COUNTER` | Initial counter value | `10` |
-| `canary_token_url` | `KRAWL_CANARY_TOKEN_URL` | External canary token URL | None |
-| `canary_token_tries` | `KRAWL_CANARY_TOKEN_TRIES` | Requests before showing canary token | `10` |
-| `dashboard_secret_path` | `KRAWL_DASHBOARD_SECRET_PATH` | Custom dashboard path | Auto-generated |
-| `api_server_url` | `KRAWL_API_SERVER_URL` | API server URL | None |
-| `api_server_port` | `KRAWL_API_SERVER_PORT` | API server port | `8080` |
-| `api_server_path` | `KRAWL_API_SERVER_PATH` | API server endpoint path | `/api/v2/users` |
-| `probability_error_codes` | `KRAWL_PROBABILITY_ERROR_CODES` | Error response probability (0-100%) | `0` |
-| `database_path` | `KRAWL_DATABASE_PATH` | Database file location | `data/krawl.db` |
-| `database_retention_days` | `KRAWL_DATABASE_RETENTION_DAYS` | Days to retain data in database | `30` |
-| `http_risky_methods_threshold` | `KRAWL_HTTP_RISKY_METHODS_THRESHOLD` | Threshold for risky HTTP methods detection | `0.1` |
-| `violated_robots_threshold` | `KRAWL_VIOLATED_ROBOTS_THRESHOLD` | Threshold for robots.txt violations | `0.1` |
-| `uneven_request_timing_threshold` | `KRAWL_UNEVEN_REQUEST_TIMING_THRESHOLD` | Coefficient of variation threshold for timing | `0.5` |
-| `uneven_request_timing_time_window_seconds` | `KRAWL_UNEVEN_REQUEST_TIMING_TIME_WINDOW_SECONDS` | Time window for request timing analysis in seconds | `300` |
-| `user_agents_used_threshold` | `KRAWL_USER_AGENTS_USED_THRESHOLD` | Threshold for detecting multiple user agents | `2` |
-| `attack_urls_threshold` | `KRAWL_ATTACK_URLS_THRESHOLD` | Threshold for attack URL detection | `1` |
-
-### Examples
-
-```bash
-# Set port and delay
-export KRAWL_PORT=8080
-export KRAWL_DELAY=200
-
# Set canary token
+export CONFIG_LOCATION="config.yaml"
export KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url"
-# Set tuple values (min,max format)
-export KRAWL_LINKS_LENGTH_RANGE="3,20"
+# Set number of pages range (min,max format)
export KRAWL_LINKS_PER_PAGE_RANGE="5,25"
# Set analyzer thresholds
@@ -280,7 +232,7 @@ export KRAWL_VIOLATED_ROBOTS_THRESHOLD="0.15"
export KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard"
```
-Or in Docker:
+Example of a Docker run with env variables:
```bash
docker run -d \
@@ -292,36 +244,20 @@ docker run -d \
ghcr.io/blessedrebus/krawl:latest
```
-## robots.txt
-The actual (juicy) robots.txt configuration is the following
+### Configuration via config.yaml
+You can use the [config.yaml](config.yaml) file for more advanced configurations, such as Docker Compose or Helm chart deployments.
-```txt
-Disallow: /admin/
-Disallow: /api/
-Disallow: /backup/
-Disallow: /config/
-Disallow: /database/
-Disallow: /private/
-Disallow: /uploads/
-Disallow: /wp-admin/
-Disallow: /phpMyAdmin/
-Disallow: /admin/login.php
-Disallow: /api/v1/users
-Disallow: /api/v2/secrets
-Disallow: /.env
-Disallow: /credentials.txt
-Disallow: /passwords.txt
-Disallow: /.git/
-Disallow: /backup.sql
-Disallow: /db_backup.sql
-```
+# Honeypot
+Below is a complete overview of the Krawl honeypot’s capabilities
+
+## robots.txt
+The actual (juicy) robots.txt configuration [is the following](src/templates/html/robots.txt).
## Honeypot pages
Requests to common admin endpoints (`/admin/`, `/wp-admin/`, `/phpMyAdmin/`) return a fake login page. Any login attempt triggers a 1-second delay to simulate real processing and is fully logged in the dashboard (credentials, IP, headers, timing).
-
-
-
+
+
Requests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/uploads/` return a fake directory listing populated with “interesting” files, each assigned a random file size to look realistic.
@@ -329,21 +265,23 @@ Requests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/u
The `.env` endpoint exposes fake database connection strings, **AWS API keys**, and **Stripe secrets**. It intentionally returns an error due to the `Content-Type` being `application/json` instead of plain text, mimicking a “juicy” misconfiguration that crawlers and scanners often flag as information leakage.
-
+The `/server` page displays randomly generated fake error information for each known server.
+
+
The pages `/api/v1/users` and `/api/v2/secrets` show fake users and random secrets in JSON format
-
-
-
-
+
The pages `/credentials.txt` and `/passwords.txt` show fake users and random secrets
-
-
-
-
+
+
+Pages such as `/users`, `/search`, `/contact`, `/info`, `/input`, and `/feedback`, along with APIs like `/api/sql` and `/api/database`, are designed to lure attackers into performing attacks such as **SQL injection** or **XSS**.
+
+
+
+Automated tools like **SQLMap** will receive a different randomized database error on each request, increasing scan noise and confusing the attacker. All detected attacks are logged and displayed in the dashboard.
## Customizing the Canary Token
To create a custom canary token, visit https://canarytokens.org
@@ -384,11 +322,13 @@ Access the dashboard at `http://:/`
The dashboard shows:
- Total and unique accesses
-- Suspicious activity detection
-- Top IPs, paths, and user-agents
+- Suspicious activity and attack detection
+- Top IPs, paths, user-agents and GeoIP localization
- Real-time monitoring
-The attackers' triggered honeypot path and the suspicious activity (such as failed login attempts) are logged
+The attackers’ access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged.
+
+Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website.

@@ -396,14 +336,7 @@ The top IP Addresses is shown along with top paths and User Agents

-### Retrieving Dashboard Path
-
-Check server startup logs or get the secret with
-
-```bash
-kubectl get secret krawl-server -n krawl-system \
- -o jsonpath='{.data.dashboard-path}' | base64 -d && echo
-```
+
## 🤝 Contributing
diff --git a/config.yaml b/config.yaml
index 3e1d644..c3424d6 100644
--- a/config.yaml
+++ b/config.yaml
@@ -22,12 +22,8 @@ canary:
dashboard:
# if set to "null" this will Auto-generates random path if not set
# can be set to "/dashboard" or similar <-- note this MUST include a forward slash
- secret_path: super-secret-dashboard-path
-
-api:
- server_url: null
- server_port: 8080
- server_path: "/api/v2/users"
+ # secret_path: super-secret-dashboard-path
+ secret_path: null
database:
path: "data/krawl.db"
diff --git a/helm/Chart.yaml b/helm/Chart.yaml
index b2b4cc3..9ff2db0 100644
--- a/helm/Chart.yaml
+++ b/helm/Chart.yaml
@@ -2,8 +2,8 @@ apiVersion: v2
name: krawl-chart
description: A Helm chart for Krawl honeypot server
type: application
-version: 0.2.1
-appVersion: 0.2.1
+version: 0.2.2
+appVersion: 0.2.2
keywords:
- honeypot
- security
diff --git a/helm/README.md b/helm/README.md
index 5e10f9c..d1ee9cd 100644
--- a/helm/README.md
+++ b/helm/README.md
@@ -10,6 +10,65 @@ A Helm chart for deploying the Krawl honeypot application on Kubernetes.
## Installation
+
+### Helm Chart
+
+Install with default values:
+
+```bash
+helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \
+ --version 0.2.2 \
+ --namespace krawl-system \
+ --create-namespace
+```
+
+Or create a minimal `values.yaml` file:
+
+```yaml
+service:
+ type: LoadBalancer
+ port: 5000
+
+ingress:
+ enabled: true
+ className: "traefik"
+ hosts:
+ - host: krawl.example.com
+ paths:
+ - path: /
+ pathType: Prefix
+
+config:
+ server:
+ port: 5000
+ delay: 100
+ dashboard:
+ secret_path: null # Auto-generated if not set
+
+database:
+ persistence:
+ enabled: true
+ size: 1Gi
+```
+
+Install with custom values:
+
+```bash
+helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \
+ --version 0.2.2 \
+ --namespace krawl-system \
+ --create-namespace \
+ -f values.yaml
+```
+
+To access the deception server:
+
+```bash
+kubectl get svc krawl -n krawl-system
+```
+
+Once the EXTERNAL-IP is assigned, access your deception server at `http://:5000`
+
### Add the repository (if applicable)
```bash
@@ -176,6 +235,15 @@ The following table lists the main configuration parameters of the Krawl chart a
|-----------|-------------|---------|
| `networkPolicy.enabled` | Enable network policy | `true` |
+### Retrieving Dashboard Path
+
+Check server startup logs or get the secret with
+
+```bash
+kubectl get secret krawl-server -n krawl-system \
+ -o jsonpath='{.data.dashboard-path}' | base64 -d && echo
+```
+
## Usage Examples
### Basic Installation
diff --git a/helm/templates/deployment.yaml b/helm/templates/deployment.yaml
index 5635fa3..f24261c 100644
--- a/helm/templates/deployment.yaml
+++ b/helm/templates/deployment.yaml
@@ -43,6 +43,10 @@ spec:
env:
- name: CONFIG_LOCATION
value: "config.yaml"
+ {{- if .Values.timezone }}
+ - name: TZ
+ value: {{ .Values.timezone | quote }}
+ {{- end }}
volumeMounts:
- name: config
mountPath: /app/config.yaml
diff --git a/helm/values.yaml b/helm/values.yaml
index 6d79b25..1a5d07b 100644
--- a/helm/values.yaml
+++ b/helm/values.yaml
@@ -49,6 +49,11 @@ resources:
cpu: 100m
memory: 64Mi
+# Container timezone configuration
+# Set this to change timezone (e.g., "America/New_York", "Europe/Rome")
+# If not set, container will use its default timezone
+timezone: ""
+
autoscaling:
enabled: false
minReplicas: 1
@@ -67,7 +72,6 @@ config:
server:
port: 5000
delay: 100
- timezone: null # IANA timezone (e.g., "America/New_York", "Europe/Rome"). If not set, system timezone is used.
links:
min_length: 5
max_length: 15
diff --git a/img/admin-page.png b/img/admin-page.png
index ba82843..790e3c3 100644
Binary files a/img/admin-page.png and b/img/admin-page.png differ
diff --git a/img/api-secrets-page.png b/img/api-secrets-page.png
deleted file mode 100644
index 77b47c8..0000000
Binary files a/img/api-secrets-page.png and /dev/null differ
diff --git a/img/api-users-page.png b/img/api-users-page.png
deleted file mode 100644
index 6746594..0000000
Binary files a/img/api-users-page.png and /dev/null differ
diff --git a/img/credentials-and-passwords.png b/img/credentials-and-passwords.png
new file mode 100644
index 0000000..acb134a
Binary files /dev/null and b/img/credentials-and-passwords.png differ
diff --git a/img/credentials-page.png b/img/credentials-page.png
deleted file mode 100644
index bc3fffa..0000000
Binary files a/img/credentials-page.png and /dev/null differ
diff --git a/img/dashboard-1.png b/img/dashboard-1.png
index ad11dd8..4479914 100644
Binary files a/img/dashboard-1.png and b/img/dashboard-1.png differ
diff --git a/img/dashboard-2.png b/img/dashboard-2.png
index 65c0766..e6a208d 100644
Binary files a/img/dashboard-2.png and b/img/dashboard-2.png differ
diff --git a/img/dashboard-3.png b/img/dashboard-3.png
new file mode 100644
index 0000000..e7b24df
Binary files /dev/null and b/img/dashboard-3.png differ
diff --git a/img/env-page.png b/img/env-page.png
deleted file mode 100644
index a738732..0000000
Binary files a/img/env-page.png and /dev/null differ
diff --git a/img/geoip_dashboard.png b/img/geoip_dashboard.png
new file mode 100644
index 0000000..6825be7
Binary files /dev/null and b/img/geoip_dashboard.png differ
diff --git a/img/ip-reputation.png b/img/ip-reputation.png
new file mode 100644
index 0000000..9119e63
Binary files /dev/null and b/img/ip-reputation.png differ
diff --git a/img/passwords-page.png b/img/passwords-page.png
deleted file mode 100644
index c9ca2f0..0000000
Binary files a/img/passwords-page.png and /dev/null differ
diff --git a/img/server-and-env-page.png b/img/server-and-env-page.png
new file mode 100644
index 0000000..700c39d
Binary files /dev/null and b/img/server-and-env-page.png differ
diff --git a/img/sql_injection.png b/img/sql_injection.png
new file mode 100644
index 0000000..8eb8ad3
Binary files /dev/null and b/img/sql_injection.png differ
diff --git a/img/users-and-secrets.png b/img/users-and-secrets.png
new file mode 100644
index 0000000..f99297e
Binary files /dev/null and b/img/users-and-secrets.png differ
diff --git a/kubernetes/README.md b/kubernetes/README.md
new file mode 100644
index 0000000..d803496
--- /dev/null
+++ b/kubernetes/README.md
@@ -0,0 +1,47 @@
+### Kubernetes
+
+Apply all manifests with:
+
+```bash
+kubectl apply -f https://raw.githubusercontent.com/BlessedRebuS/Krawl/refs/heads/main/kubernetes/krawl-all-in-one-deploy.yaml
+```
+
+Or clone the repo and apply the manifest:
+
+```bash
+kubectl apply -f kubernetes/krawl-all-in-one-deploy.yaml
+```
+
+Access the deception server:
+
+```bash
+kubectl get svc krawl-server -n krawl-system
+```
+
+Once the EXTERNAL-IP is assigned, access your deception server at `http://:5000`
+
+### Retrieving Dashboard Path
+
+Check server startup logs or get the secret with
+
+```bash
+kubectl get secret krawl-server -n krawl-system \
+ -o jsonpath='{.data.dashboard-path}' | base64 -d && echo
+```
+
+### From Source (Python 3.11+)
+
+Clone the repository:
+
+```bash
+git clone https://github.com/blessedrebus/krawl.git
+cd krawl/src
+```
+
+Run the server:
+
+```bash
+python3 server.py
+```
+
+Visit `http://localhost:5000` and access the dashboard at `http://localhost:5000/`
diff --git a/src/config.py b/src/config.py
index 71cef0e..d3252e7 100644
--- a/src/config.py
+++ b/src/config.py
@@ -28,9 +28,6 @@ class Config:
canary_token_url: Optional[str] = None
canary_token_tries: int = 10
dashboard_secret_path: str = None
- api_server_url: Optional[str] = None
- api_server_port: int = 8080
- api_server_path: str = "/api/v2/users"
probability_error_codes: int = 0 # Percentage (0-100)
# Crawl limiting settings - for legitimate vs malicious crawlers
@@ -187,9 +184,6 @@ class Config:
canary_token_url=canary.get("token_url"),
canary_token_tries=canary.get("token_tries", 10),
dashboard_secret_path=dashboard_path,
- api_server_url=api.get("server_url"),
- api_server_port=api.get("server_port", 8080),
- api_server_path=api.get("server_path", "/api/v2/users"),
probability_error_codes=behavior.get("probability_error_codes", 0),
database_path=database.get("path", "data/krawl.db"),
database_retention_days=database.get("retention_days", 30),
diff --git a/src/database.py b/src/database.py
index 5af71dd..36cc7e1 100644
--- a/src/database.py
+++ b/src/database.py
@@ -94,6 +94,9 @@ class DatabaseManager:
# Create all tables
Base.metadata.create_all(self._engine)
+ # Run automatic migrations for backward compatibility
+ self._run_migrations(database_path)
+
# Set restrictive file permissions (owner read/write only)
if os.path.exists(database_path):
try:
@@ -104,6 +107,47 @@ class DatabaseManager:
self._initialized = True
+ def _run_migrations(self, database_path: str) -> None:
+ """
+ Run automatic migrations for backward compatibility.
+ Adds missing columns that were added in newer versions.
+
+ Args:
+ database_path: Path to the SQLite database file
+ """
+ import sqlite3
+
+ try:
+ conn = sqlite3.connect(database_path)
+ cursor = conn.cursor()
+
+ # Check if latitude/longitude columns exist
+ cursor.execute("PRAGMA table_info(ip_stats)")
+ columns = [row[1] for row in cursor.fetchall()]
+
+ migrations_run = []
+
+ # Add latitude column if missing
+ if "latitude" not in columns:
+ cursor.execute("ALTER TABLE ip_stats ADD COLUMN latitude REAL")
+ migrations_run.append("latitude")
+
+ # Add longitude column if missing
+ if "longitude" not in columns:
+ cursor.execute("ALTER TABLE ip_stats ADD COLUMN longitude REAL")
+ migrations_run.append("longitude")
+
+ if migrations_run:
+ conn.commit()
+ applogger.info(
+ f"Auto-migration: Added columns {', '.join(migrations_run)} to ip_stats table"
+ )
+
+ conn.close()
+ except Exception as e:
+ applogger.error(f"Auto-migration failed: {e}")
+ # Don't raise - allow app to continue even if migration fails
+
@property
def session(self) -> Session:
"""Get a thread-local database session."""
@@ -399,6 +443,8 @@ class DatabaseManager:
asn_org: str,
list_on: Dict[str, str],
city: Optional[str] = None,
+ latitude: Optional[float] = None,
+ longitude: Optional[float] = None,
) -> None:
"""
Update IP rep stats
@@ -410,6 +456,8 @@ class DatabaseManager:
asn_org: IP address ASN ORG
list_on: public lists containing the IP address
city: City name (optional)
+ latitude: Latitude coordinate (optional)
+ longitude: Longitude coordinate (optional)
"""
session = self.session
@@ -423,6 +471,10 @@ class DatabaseManager:
ip_stats.list_on = list_on
if city:
ip_stats.city = city
+ if latitude is not None:
+ ip_stats.latitude = latitude
+ if longitude is not None:
+ ip_stats.longitude = longitude
session.commit()
except Exception as e:
session.rollback()
@@ -433,7 +485,7 @@ class DatabaseManager:
def get_unenriched_ips(self, limit: int = 100) -> List[str]:
"""
Get IPs that don't have complete reputation data yet.
- Returns IPs without country_code OR without city data.
+ Returns IPs without country_code, city, latitude, or longitude data.
Excludes RFC1918 private addresses and other non-routable IPs.
Args:
@@ -442,27 +494,61 @@ class DatabaseManager:
Returns:
List of IP addresses without complete reputation data
"""
+ from sqlalchemy.exc import OperationalError
+
session = self.session
try:
- ips = (
- session.query(IpStats.ip)
- .filter(
- or_(IpStats.country_code.is_(None), IpStats.city.is_(None)),
- ~IpStats.ip.like("10.%"),
- ~IpStats.ip.like("172.16.%"),
- ~IpStats.ip.like("172.17.%"),
- ~IpStats.ip.like("172.18.%"),
- ~IpStats.ip.like("172.19.%"),
- ~IpStats.ip.like("172.2_.%"),
- ~IpStats.ip.like("172.30.%"),
- ~IpStats.ip.like("172.31.%"),
- ~IpStats.ip.like("192.168.%"),
- ~IpStats.ip.like("127.%"),
- ~IpStats.ip.like("169.254.%"),
+ # Try to query including latitude/longitude (for backward compatibility)
+ try:
+ ips = (
+ session.query(IpStats.ip)
+ .filter(
+ or_(
+ IpStats.country_code.is_(None),
+ IpStats.city.is_(None),
+ IpStats.latitude.is_(None),
+ IpStats.longitude.is_(None),
+ ),
+ ~IpStats.ip.like("10.%"),
+ ~IpStats.ip.like("172.16.%"),
+ ~IpStats.ip.like("172.17.%"),
+ ~IpStats.ip.like("172.18.%"),
+ ~IpStats.ip.like("172.19.%"),
+ ~IpStats.ip.like("172.2_.%"),
+ ~IpStats.ip.like("172.30.%"),
+ ~IpStats.ip.like("172.31.%"),
+ ~IpStats.ip.like("192.168.%"),
+ ~IpStats.ip.like("127.%"),
+ ~IpStats.ip.like("169.254.%"),
+ )
+ .limit(limit)
+ .all()
)
- .limit(limit)
- .all()
- )
+ except OperationalError as e:
+ # If latitude/longitude columns don't exist yet, fall back to old query
+ if "no such column" in str(e).lower():
+ ips = (
+ session.query(IpStats.ip)
+ .filter(
+ or_(IpStats.country_code.is_(None), IpStats.city.is_(None)),
+ ~IpStats.ip.like("10.%"),
+ ~IpStats.ip.like("172.16.%"),
+ ~IpStats.ip.like("172.17.%"),
+ ~IpStats.ip.like("172.18.%"),
+ ~IpStats.ip.like("172.19.%"),
+ ~IpStats.ip.like("172.2_.%"),
+ ~IpStats.ip.like("172.30.%"),
+ ~IpStats.ip.like("172.31.%"),
+ ~IpStats.ip.like("192.168.%"),
+ ~IpStats.ip.like("127.%"),
+ ~IpStats.ip.like("169.254.%"),
+ )
+ .limit(limit)
+ .all()
+ )
+ else:
+ raise
+
return [ip[0] for ip in ips]
finally:
self.close_session()
@@ -718,6 +804,8 @@ class DatabaseManager:
"last_seen": a.last_seen.isoformat() if a.last_seen else None,
"country_code": a.country_code,
"city": a.city,
+ "latitude": a.latitude,
+ "longitude": a.longitude,
"asn": a.asn,
"asn_org": a.asn_org,
"reputation_score": a.reputation_score,
@@ -813,6 +901,8 @@ class DatabaseManager:
"last_seen": ip.last_seen.isoformat() if ip.last_seen else None,
"country_code": ip.country_code,
"city": ip.city,
+ "latitude": ip.latitude,
+ "longitude": ip.longitude,
"asn": ip.asn,
"asn_org": ip.asn_org,
"reputation_score": ip.reputation_score,
diff --git a/src/exports/malicious_ips.txt b/src/exports/malicious_ips.txt
deleted file mode 100644
index 2541a21..0000000
--- a/src/exports/malicious_ips.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-175.23.45.67
-210.45.67.89
diff --git a/src/handler.py b/src/handler.py
index b3c76e7..0a6abb2 100644
--- a/src/handler.py
+++ b/src/handler.py
@@ -493,6 +493,47 @@ class Handler(BaseHTTPRequestHandler):
return
user_agent = self._get_user_agent()
+ # Handle static files for dashboard
+ if self.config.dashboard_secret_path and self.path.startswith(
+ f"{self.config.dashboard_secret_path}/static/"
+ ):
+ import os
+
+ file_path = self.path.replace(
+ f"{self.config.dashboard_secret_path}/static/", ""
+ )
+ static_dir = os.path.join(os.path.dirname(__file__), "templates", "static")
+ full_path = os.path.join(static_dir, file_path)
+
+ # Security check: ensure the path is within static directory
+ if os.path.commonpath(
+ [full_path, static_dir]
+ ) == static_dir and os.path.exists(full_path):
+ try:
+ with open(full_path, "rb") as f:
+ content = f.read()
+ self.send_response(200)
+ if file_path.endswith(".svg"):
+ self.send_header("Content-type", "image/svg+xml")
+ elif file_path.endswith(".css"):
+ self.send_header("Content-type", "text/css")
+ elif file_path.endswith(".js"):
+ self.send_header("Content-type", "application/javascript")
+ else:
+ self.send_header("Content-type", "application/octet-stream")
+ self.send_header("Content-Length", str(len(content)))
+ self.end_headers()
+ self.wfile.write(content)
+ return
+ except Exception as e:
+ self.app_logger.error(f"Error serving static file: {e}")
+
+ self.send_response(404)
+ self.send_header("Content-type", "text/plain")
+ self.end_headers()
+ self.wfile.write(b"Not found")
+ return
+
if (
self.config.dashboard_secret_path
and self.path == self.config.dashboard_secret_path
diff --git a/src/models.py b/src/models.py
index 3789ab2..2dbeb30 100644
--- a/src/models.py
+++ b/src/models.py
@@ -8,7 +8,16 @@ Stores access logs, credential attempts, attack detections, and IP statistics.
from datetime import datetime
from typing import Optional, List, Dict
-from sqlalchemy import String, Integer, Boolean, DateTime, ForeignKey, Index, JSON
+from sqlalchemy import (
+ String,
+ Integer,
+ Boolean,
+ DateTime,
+ Float,
+ ForeignKey,
+ Index,
+ JSON,
+)
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
from sanitizer import (
@@ -153,6 +162,8 @@ class IpStats(Base):
# GeoIP fields (populated by future enrichment)
country_code: Mapped[Optional[str]] = mapped_column(String(2), nullable=True)
city: Mapped[Optional[str]] = mapped_column(String(MAX_CITY_LENGTH), nullable=True)
+ latitude: Mapped[Optional[float]] = mapped_column(Float, nullable=True)
+ longitude: Mapped[Optional[float]] = mapped_column(Float, nullable=True)
asn: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
asn_org: Mapped[Optional[str]] = mapped_column(
String(MAX_ASN_ORG_LENGTH), nullable=True
diff --git a/src/server.py b/src/server.py
index 524359d..94f1d1e 100644
--- a/src/server.py
+++ b/src/server.py
@@ -29,26 +29,26 @@ def print_usage():
print("If no file is provided, random links will be generated.\n")
print("Configuration:")
print(" Configuration is loaded from a YAML file (default: config.yaml)")
- print(" Set CONFIG_LOCATION environment variable to use a different file.\n")
- print(" Example config.yaml structure:")
- print(" server:")
- print(" port: 5000")
- print(" delay: 100")
- print(" links:")
- print(" min_length: 5")
- print(" max_length: 15")
- print(" min_per_page: 10")
- print(" max_per_page: 15")
- print(" canary:")
- print(" token_url: null")
- print(" token_tries: 10")
- print(" dashboard:")
- print(" secret_path: null # auto-generated if not set")
- print(" database:")
- print(' path: "data/krawl.db"')
- print(" retention_days: 30")
- print(" behavior:")
- print(" probability_error_codes: 0")
+ print("Set CONFIG_LOCATION environment variable to use a different file.\n")
+ print("Example config.yaml structure:")
+ print("server:")
+ print("port: 5000")
+ print("delay: 100")
+ print("links:")
+ print("min_length: 5")
+ print("max_length: 15")
+ print("min_per_page: 10")
+ print("max_per_page: 15")
+ print("canary:")
+ print("token_url: null")
+ print("token_tries: 10")
+ print("dashboard:")
+ print("secret_path: null # auto-generated if not set")
+ print("database:")
+ print('path: "data/krawl.db"')
+ print("retention_days: 30")
+ print("behavior:")
+ print("probability_error_codes: 0")
def main():
@@ -103,8 +103,16 @@ def main():
tasks_master.run_scheduled_tasks()
try:
+
+ banner = f"""
+
+============================================================
+DASHBOARD AVAILABLE AT
+{config.dashboard_secret_path}
+============================================================
+ """
+ app_logger.info(banner)
app_logger.info(f"Starting deception server on port {config.port}...")
- app_logger.info(f"Dashboard available at: {config.dashboard_secret_path}")
if config.canary_token_url:
app_logger.info(
f"Canary token will appear after {config.canary_token_tries} tries"
diff --git a/src/tasks/fetch_ip_rep.py b/src/tasks/fetch_ip_rep.py
index a005c62..eac6645 100644
--- a/src/tasks/fetch_ip_rep.py
+++ b/src/tasks/fetch_ip_rep.py
@@ -45,6 +45,8 @@ def main():
country_iso_code = geoip_data.get("country_iso_code")
asn = geoip_data.get("asn_autonomous_system_number")
asn_org = geoip_data.get("asn_autonomous_system_organization")
+ latitude = geoip_data.get("location_latitude")
+ longitude = geoip_data.get("location_longitude")
# Extract city from coordinates using reverse geocoding
city = extract_city_from_coordinates(geoip_data)
@@ -62,6 +64,8 @@ def main():
sanitized_asn_org,
sanitized_list_on,
sanitized_city,
+ latitude,
+ longitude,
)
except requests.RequestException as e:
app_logger.warning(f"Failed to fetch IP rep for {ip}: {e}")
diff --git a/src/tasks/top_attacking_ips.py b/src/tasks/top_attacking_ips.py
index 73a135c..6e3ecd7 100644
--- a/src/tasks/top_attacking_ips.py
+++ b/src/tasks/top_attacking_ips.py
@@ -6,7 +6,7 @@ from zoneinfo import ZoneInfo
from logger import get_app_logger
from database import get_database
from config import get_config
-from models import AccessLog
+from models import AccessLog, IpStats
from ip_utils import is_local_or_private_ip, is_valid_public_ip
from sqlalchemy import distinct
@@ -44,7 +44,8 @@ def has_recent_honeypot_access(session, minutes: int = 5) -> bool:
def main():
"""
- Export all IPs flagged as suspicious to a text file.
+ Export all attacker IPs to a text file, matching the "Attackers by Total Requests" dashboard table.
+ Uses the same query as the dashboard: IpStats where category == "attacker", ordered by total_requests.
TasksMaster will call this function based on the cron schedule.
"""
task_name = TASK_CONFIG.get("name")
@@ -61,10 +62,11 @@ def main():
)
return
- # Query distinct suspicious IPs
- results = (
- session.query(distinct(AccessLog.ip))
- .filter(AccessLog.is_suspicious == True)
+ # Query attacker IPs from IpStats (same as dashboard "Attackers by Total Requests")
+ attackers = (
+ session.query(IpStats)
+ .filter(IpStats.category == "attacker")
+ .order_by(IpStats.total_requests.desc())
.all()
)
@@ -72,7 +74,11 @@ def main():
config = get_config()
server_ip = config.get_server_ip()
- public_ips = [ip for (ip,) in results if is_valid_public_ip(ip, server_ip)]
+ public_ips = [
+ attacker.ip
+ for attacker in attackers
+ if is_valid_public_ip(attacker.ip, server_ip)
+ ]
# Ensure exports directory exists
os.makedirs(EXPORTS_DIR, exist_ok=True)
@@ -83,8 +89,8 @@ def main():
f.write(f"{ip}\n")
app_logger.info(
- f"[Background Task] {task_name} exported {len(public_ips)} public IPs "
- f"(filtered {len(results) - len(public_ips)} local/private IPs) to {OUTPUT_FILE}"
+ f"[Background Task] {task_name} exported {len(public_ips)} attacker IPs "
+ f"(filtered {len(attackers) - len(public_ips)} local/private IPs) to {OUTPUT_FILE}"
)
except Exception as e:
diff --git a/src/templates/dashboard_template.py b/src/templates/dashboard_template.py
index 667de3d..89ca4fb 100644
--- a/src/templates/dashboard_template.py
+++ b/src/templates/dashboard_template.py
@@ -68,6 +68,7 @@ def generate_dashboard(stats: dict, dashboard_path: str = "") -> str:
Krawl Dashboard
+
@@ -84,6 +85,30 @@ def generate_dashboard(stats: dict, dashboard_path: str = "") -> str:
margin: 0 auto;
position: relative;
}}
+ .github-logo {{
+ position: absolute;
+ top: 0;
+ left: 0;
+ display: flex;
+ align-items: center;
+ gap: 8px;
+ text-decoration: none;
+ color: #58a6ff;
+ transition: color 0.2s;
+ }}
+ .github-logo:hover {{
+ color: #79c0ff;
+ }}
+ .github-logo svg {{
+ width: 32px;
+ height: 32px;
+ fill: currentColor;
+ }}
+ .github-logo-text {{
+ font-size: 14px;
+ font-weight: 600;
+ text-decoration: none;
+ }}
h1 {{
color: #58a6ff;
text-align: center;
@@ -536,17 +561,25 @@ def generate_dashboard(stats: dict, dashboard_path: str = "") -> str:
filter: none;
}}
.leaflet-popup-content-wrapper {{
- background-color: #161b22;
+ background-color: #0d1117;
color: #c9d1d9;
border: 1px solid #30363d;
- border-radius: 4px;
+ border-radius: 6px;
+ padding: 0;
+ }}
+ .leaflet-popup-content {{
+ margin: 0;
+ min-width: 280px;
}}
.leaflet-popup-content-wrapper a {{
color: #58a6ff;
}}
.leaflet-popup-tip {{
- background: #161b22;
- border-top: 6px solid #30363d;
+ background: #0d1117;
+ border: 1px solid #30363d;
+ }}
+ .ip-detail-popup .leaflet-popup-content-wrapper {{
+ max-width: 340px !important;
}}
/* Remove the default leaflet icon background */
.ip-custom-marker {{
@@ -614,6 +647,12 @@ def generate_dashboard(stats: dict, dashboard_path: str = "") -> str: