diff --git a/README.md b/README.md index f7fe399..1d0e8a5 100644 --- a/README.md +++ b/README.md @@ -1,323 +1,371 @@ -

🕷️ Krawl

- -

- - -

-
- -

- A modern, customizable zero-dependencies honeypot server designed to detect and track malicious activity through deceptive web pages, fake credentials, and canary tokens. -

- -
- - License - - - Release - -
- -
- - GitHub Container Registry - - - Kubernetes - - - Helm Chart - -
- -
- -

- What is Krawl? • - Quick Start • - Honeypot Pages • - Dashboard • - Todo • - Contributing -

- -
-
- -## Demo -Tip: crawl the `robots.txt` paths for additional fun -### Krawl URL: [http://demo.krawlme.com](http://demo.krawlme.com) -### View the dashboard [http://demo.krawlme.com/das_dashboard](http://demo.krawlme.com/das_dashboard) - -## What is Krawl? - -**Krawl** is a cloud‑native deception server designed to detect, delay, and analyze malicious web crawlers and automated scanners. - -It creates realistic fake web applications filled with low‑hanging fruit such as admin panels, configuration files, and exposed fake credentials to attract and identify suspicious activity. - -By wasting attacker resources, Krawl helps clearly distinguish malicious behavior from legitimate crawlers. - -It features: - -- **Spider Trap Pages**: Infinite random links to waste crawler resources based on the [spidertrap project](https://github.com/adhdproject/spidertrap) -- **Fake Login Pages**: WordPress, phpMyAdmin, admin panels -- **Honeypot Paths**: Advertised in robots.txt to catch scanners -- **Fake Credentials**: Realistic-looking usernames, passwords, API keys -- **[Canary Token](#customizing-the-canary-token) Integration**: External alert triggering -- **Real-time Dashboard**: Monitor suspicious activity -- **Customizable Wordlists**: Easy JSON-based configuration -- **Random Error Injection**: Mimic real server behavior - -![asd](img/deception-page.png) - -## 🚀 Quick Start -## Helm Chart - -Install with default values - -```bash -helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \ - --namespace krawl-system \ - --create-namespace -``` - -Install with custom [canary token](#customizing-the-canary-token) - -```bash -helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \ - --namespace krawl-system \ - --create-namespace \ - --set config.canaryTokenUrl="http://your-canary-token-url" -``` - -To access the deception server - -```bash -kubectl get svc krawl -n krawl-system -``` - -Once the EXTERNAL-IP is assigned, access your deception server at: - -``` -http://:5000 -``` - -## Kubernetes / Kustomize -Apply all manifests with - -```bash -kubectl apply -f https://raw.githubusercontent.com/BlessedRebuS/Krawl/refs/heads/main/manifests/krawl-all-in-one-deploy.yaml -``` - -Retrieve dashboard path with -```bash -kubectl get secret krawl-server -n krawl-system -o jsonpath='{.data.dashboard-path}' | base64 -d -``` - -Or clone the repo and apply the `manifest` folder with - -```bash -kubectl apply -k manifests -``` - -## Docker -Run Krawl as a docker container with - -```bash -docker run -d \ - -p 5000:5000 \ - -e CANARY_TOKEN_URL="http://your-canary-token-url" \ - --name krawl \ - ghcr.io/blessedrebus/krawl:latest -``` - -## Docker Compose -Run Krawl with docker-compose in the project folder with - -```bash -docker-compose up -d -``` - -Stop it with - -```bash -docker-compose down -``` - -## Python 3.11+ - -Clone the repository - -```bash -git clone https://github.com/blessedrebus/krawl.git -cd krawl/src -``` -Run the server -```bash -python3 server.py -``` - -Visit - -`http://localhost:5000` - -To access the dashboard - -`http://localhost:5000/` - -## Configuration via Environment Variables - -To customize the deception server installation several **environment variables** can be specified. - -| Variable | Description | Default | -|----------|-------------|---------| -| `PORT` | Server listening port | `5000` | -| `DELAY` | Response delay in milliseconds | `100` | -| `LINKS_MIN_LENGTH` | Minimum random link length | `5` | -| `LINKS_MAX_LENGTH` | Maximum random link length | `15` | -| `LINKS_MIN_PER_PAGE` | Minimum links per page | `10` | -| `LINKS_MAX_PER_PAGE` | Maximum links per page | `15` | -| `MAX_COUNTER` | Initial counter value | `10` | -| `CANARY_TOKEN_TRIES` | Requests before showing canary token | `10` | -| `CANARY_TOKEN_URL` | External canary token URL | None | -| `DASHBOARD_SECRET_PATH` | Custom dashboard path | Auto-generated | -| `PROBABILITY_ERROR_CODES` | Error response probability (0-100%) | `0` | -| `SERVER_HEADER` | HTTP Server header for deception | `Apache/2.2.22 (Ubuntu)` | -| `TIMEZONE` | IANA timezone for logs and dashboard (e.g., `America/New_York`, `Europe/Rome`) | System timezone | - -## robots.txt -The actual (juicy) robots.txt configuration is the following - -```txt -Disallow: /admin/ -Disallow: /api/ -Disallow: /backup/ -Disallow: /config/ -Disallow: /database/ -Disallow: /private/ -Disallow: /uploads/ -Disallow: /wp-admin/ -Disallow: /phpMyAdmin/ -Disallow: /admin/login.php -Disallow: /api/v1/users -Disallow: /api/v2/secrets -Disallow: /.env -Disallow: /credentials.txt -Disallow: /passwords.txt -Disallow: /.git/ -Disallow: /backup.sql -Disallow: /db_backup.sql -``` - -## Honeypot pages -Requests to common admin endpoints (`/admin/`, `/wp-admin/`, `/phpMyAdmin/`) return a fake login page. Any login attempt triggers a 1-second delay to simulate real processing and is fully logged in the dashboard (credentials, IP, headers, timing). - -
- -
- -Requests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/uploads/` return a fake directory listing populated with “interesting” files, each assigned a random file size to look realistic. - -![directory-page](img/directory-page.png) - -The `.env` endpoint exposes fake database connection strings, **AWS API keys**, and **Stripe secrets**. It intentionally returns an error due to the `Content-Type` being `application/json` instead of plain text, mimicking a “juicy” misconfiguration that crawlers and scanners often flag as information leakage. - -![env-page](img/env-page.png) - -The pages `/api/v1/users` and `/api/v2/secrets` show fake users and random secrets in JSON format - -
- - -
- -The pages `/credentials.txt` and `/passwords.txt` show fake users and random secrets - -
- - -
- -## Customizing the Canary Token -To create a custom canary token, visit https://canarytokens.org - -and generate a “Web bug” canary token. - -This optional token is triggered when a crawler fully traverses the webpage until it reaches 0. At that point, a URL is returned. When this URL is requested, it sends an alert to the user via email, including the visitor’s IP address and user agent. - - -To enable this feature, set the canary token URL [using the environment variable](#configuration-via-environment-variables) `CANARY_TOKEN_URL`. - -## Customizing the wordlist - -Edit `wordlists.json` to customize fake data for your use case - -```json -{ - "usernames": { - "prefixes": ["admin", "root", "user"], - "suffixes": ["_prod", "_dev", "123"] - }, - "passwords": { - "prefixes": ["P@ssw0rd", "Admin"], - "simple": ["test", "password"] - }, - "directory_listing": { - "files": ["credentials.txt", "backup.sql"], - "directories": ["admin/", "backup/"] - } -} -``` - -or **values.yaml** in the case of helm chart installation - -## Dashboard - -Access the dashboard at `http://:/` - -The dashboard shows: -- Total and unique accesses -- Suspicious activity detection -- Top IPs, paths, and user-agents -- Real-time monitoring - -The attackers' triggered honeypot path and the suspicious activity (such as failed login attempts) are logged - -![dashboard-1](img/dashboard-1.png) - -The top IP Addresses is shown along with top paths and User Agents - -![dashboard-2](img/dashboard-2.png) - -### Retrieving Dashboard Path - -Check server startup logs or get the secret with - -```bash -kubectl get secret krawl-server -n krawl-system \ - -o jsonpath='{.data.dashboard-path}' | base64 -d && echo -``` - -## 🤝 Contributing - -Contributions welcome! Please: -1. Fork the repository -2. Create a feature branch -3. Make your changes -4. Submit a pull request (explain the changes!) - - -
- -## ⚠️ Disclaimer - -**This is a deception/honeypot system.** -Deploy in isolated environments and monitor carefully for security events. -Use responsibly and in compliance with applicable laws and regulations. - -## Star History -Star History Chart +

🕷️ Krawl

+ +

+ + +

+
+ +

+ A modern, customizable zero-dependencies honeypot server designed to detect and track malicious activity through deceptive web pages, fake credentials, and canary tokens. +

+ + + + + +
+ +

+ What is Krawl? • + Quick Start • + Honeypot Pages • + Dashboard • + Todo • + Contributing +

+ +
+
+ +## Demo +Tip: crawl the `robots.txt` paths for additional fun +### Krawl URL: [http://demo.krawlme.com](http://demo.krawlme.com) +### View the dashboard [http://demo.krawlme.com/das_dashboard](http://demo.krawlme.com/das_dashboard) + +## What is Krawl? + +**Krawl** is a cloud‑native deception server designed to detect, delay, and analyze malicious web crawlers and automated scanners. + +It creates realistic fake web applications filled with low‑hanging fruit such as admin panels, configuration files, and exposed fake credentials to attract and identify suspicious activity. + +By wasting attacker resources, Krawl helps clearly distinguish malicious behavior from legitimate crawlers. + +It features: + +- **Spider Trap Pages**: Infinite random links to waste crawler resources based on the [spidertrap project](https://github.com/adhdproject/spidertrap) +- **Fake Login Pages**: WordPress, phpMyAdmin, admin panels +- **Honeypot Paths**: Advertised in robots.txt to catch scanners +- **Fake Credentials**: Realistic-looking usernames, passwords, API keys +- **[Canary Token](#customizing-the-canary-token) Integration**: External alert triggering +- **Real-time Dashboard**: Monitor suspicious activity +- **Customizable Wordlists**: Easy JSON-based configuration +- **Random Error Injection**: Mimic real server behavior + +![asd](img/deception-page.png) + +## 🚀 Quick Start +## Helm Chart + +Install with default values + +```bash +helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \ + --namespace krawl-system \ + --create-namespace +``` + +Install with custom [canary token](#customizing-the-canary-token) + +```bash +helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \ + --namespace krawl-system \ + --create-namespace \ + --set config.canaryTokenUrl="http://your-canary-token-url" +``` + +To access the deception server + +```bash +kubectl get svc krawl -n krawl-system +``` + +Once the EXTERNAL-IP is assigned, access your deception server at: + +``` +http://:5000 +``` + +## Kubernetes / Kustomize +Apply all manifests with + +```bash +kubectl apply -f https://raw.githubusercontent.com/BlessedRebuS/Krawl/refs/heads/main/manifests/krawl-all-in-one-deploy.yaml +``` + +Retrieve dashboard path with +```bash +kubectl get secret krawl-server -n krawl-system -o jsonpath='{.data.dashboard-path}' | base64 -d +``` + +Or clone the repo and apply the `manifest` folder with + +```bash +kubectl apply -k manifests +``` + +## Docker +Run Krawl as a docker container with + +```bash +docker run -d \ + -p 5000:5000 \ + -e CANARY_TOKEN_URL="http://your-canary-token-url" \ + --name krawl \ + ghcr.io/blessedrebus/krawl:latest +``` + +## Docker Compose +Run Krawl with docker-compose in the project folder with + +```bash +docker-compose up -d +``` + +Stop it with + +```bash +docker-compose down +``` + +## Python 3.11+ + +Clone the repository + +```bash +git clone https://github.com/blessedrebus/krawl.git +cd krawl/src +``` +Run the server +```bash +python3 server.py +``` + +Visit + +`http://localhost:5000` + +To access the dashboard + +`http://localhost:5000/` + +## Configuration via Environment Variables + +To customize the deception server installation, environment variables can be specified using the naming convention: `KRAWL_` where `` is the configuration field name in uppercase with special characters converted: +- `.` → `_` +- `-` → `__` (double underscore) +- ` ` (space) → `_` + +### Configuration Variables + +| Configuration Field | Environment Variable | Description | Default | +|-----------|-----------|-------------|---------| +| `port` | `KRAWL_PORT` | Server listening port | `5000` | +| `delay` | `KRAWL_DELAY` | Response delay in milliseconds | `100` | +| `server_header` | `KRAWL_SERVER_HEADER` | HTTP Server header for deception | `""` | +| `links_length_range` | `KRAWL_LINKS_LENGTH_RANGE` | Link length range as `min,max` | `5,15` | +| `links_per_page_range` | `KRAWL_LINKS_PER_PAGE_RANGE` | Links per page as `min,max` | `10,15` | +| `char_space` | `KRAWL_CHAR_SPACE` | Characters used for link generation | `abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789` | +| `max_counter` | `KRAWL_MAX_COUNTER` | Initial counter value | `10` | +| `canary_token_url` | `KRAWL_CANARY_TOKEN_URL` | External canary token URL | None | +| `canary_token_tries` | `KRAWL_CANARY_TOKEN_TRIES` | Requests before showing canary token | `10` | +| `dashboard_secret_path` | `KRAWL_DASHBOARD_SECRET_PATH` | Custom dashboard path | Auto-generated | +| `api_server_url` | `KRAWL_API_SERVER_URL` | API server URL | None | +| `api_server_port` | `KRAWL_API_SERVER_PORT` | API server port | `8080` | +| `api_server_path` | `KRAWL_API_SERVER_PATH` | API server endpoint path | `/api/v2/users` | +| `probability_error_codes` | `KRAWL_PROBABILITY_ERROR_CODES` | Error response probability (0-100%) | `0` | +| `database_path` | `KRAWL_DATABASE_PATH` | Database file location | `data/krawl.db` | +| `database_retention_days` | `KRAWL_DATABASE_RETENTION_DAYS` | Days to retain data in database | `30` | +| `http_risky_methods_threshold` | `KRAWL_HTTP_RISKY_METHODS_THRESHOLD` | Threshold for risky HTTP methods detection | `0.1` | +| `violated_robots_threshold` | `KRAWL_VIOLATED_ROBOTS_THRESHOLD` | Threshold for robots.txt violations | `0.1` | +| `uneven_request_timing_threshold` | `KRAWL_UNEVEN_REQUEST_TIMING_THRESHOLD` | Coefficient of variation threshold for timing | `0.5` | +| `uneven_request_timing_time_window_seconds` | `KRAWL_UNEVEN_REQUEST_TIMING_TIME_WINDOW_SECONDS` | Time window for request timing analysis in seconds | `300` | +| `user_agents_used_threshold` | `KRAWL_USER_AGENTS_USED_THRESHOLD` | Threshold for detecting multiple user agents | `2` | +| `attack_urls_threshold` | `KRAWL_ATTACK_URLS_THRESHOLD` | Threshold for attack URL detection | `1` | + +### Examples + +```bash +# Set port and delay +export KRAWL_PORT=8080 +export KRAWL_DELAY=200 + +# Set canary token +export KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url" + +# Set tuple values (min,max format) +export KRAWL_LINKS_LENGTH_RANGE="3,20" +export KRAWL_LINKS_PER_PAGE_RANGE="5,25" + +# Set analyzer thresholds +export KRAWL_HTTP_RISKY_METHODS_THRESHOLD="0.2" +export KRAWL_VIOLATED_ROBOTS_THRESHOLD="0.15" + +# Set custom dashboard path +export KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard" +``` + +Or in Docker: + +```bash +docker run -d \ + -p 5000:5000 \ + -e KRAWL_PORT=5000 \ + -e KRAWL_DELAY=100 \ + -e KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url" \ + --name krawl \ + ghcr.io/blessedrebus/krawl:latest +``` + +## robots.txt +The actual (juicy) robots.txt configuration is the following + +```txt +Disallow: /admin/ +Disallow: /api/ +Disallow: /backup/ +Disallow: /config/ +Disallow: /database/ +Disallow: /private/ +Disallow: /uploads/ +Disallow: /wp-admin/ +Disallow: /phpMyAdmin/ +Disallow: /admin/login.php +Disallow: /api/v1/users +Disallow: /api/v2/secrets +Disallow: /.env +Disallow: /credentials.txt +Disallow: /passwords.txt +Disallow: /.git/ +Disallow: /backup.sql +Disallow: /db_backup.sql +``` + +## Honeypot pages +Requests to common admin endpoints (`/admin/`, `/wp-admin/`, `/phpMyAdmin/`) return a fake login page. Any login attempt triggers a 1-second delay to simulate real processing and is fully logged in the dashboard (credentials, IP, headers, timing). + +
+ +
+ +Requests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/uploads/` return a fake directory listing populated with “interesting” files, each assigned a random file size to look realistic. + +![directory-page](img/directory-page.png) + +The `.env` endpoint exposes fake database connection strings, **AWS API keys**, and **Stripe secrets**. It intentionally returns an error due to the `Content-Type` being `application/json` instead of plain text, mimicking a “juicy” misconfiguration that crawlers and scanners often flag as information leakage. + +![env-page](img/env-page.png) + +The pages `/api/v1/users` and `/api/v2/secrets` show fake users and random secrets in JSON format + +
+ + +
+ +The pages `/credentials.txt` and `/passwords.txt` show fake users and random secrets + +
+ + +
+ +## Customizing the Canary Token +To create a custom canary token, visit https://canarytokens.org + +and generate a “Web bug” canary token. + +This optional token is triggered when a crawler fully traverses the webpage until it reaches 0. At that point, a URL is returned. When this URL is requested, it sends an alert to the user via email, including the visitor’s IP address and user agent. + + +To enable this feature, set the canary token URL [using the environment variable](#configuration-via-environment-variables) `CANARY_TOKEN_URL`. + +## Customizing the wordlist + +Edit `wordlists.json` to customize fake data for your use case + +```json +{ + "usernames": { + "prefixes": ["admin", "root", "user"], + "suffixes": ["_prod", "_dev", "123"] + }, + "passwords": { + "prefixes": ["P@ssw0rd", "Admin"], + "simple": ["test", "password"] + }, + "directory_listing": { + "files": ["credentials.txt", "backup.sql"], + "directories": ["admin/", "backup/"] + } +} +``` + +or **values.yaml** in the case of helm chart installation + +## Dashboard + +Access the dashboard at `http://:/` + +The dashboard shows: +- Total and unique accesses +- Suspicious activity detection +- Top IPs, paths, and user-agents +- Real-time monitoring + +The attackers' triggered honeypot path and the suspicious activity (such as failed login attempts) are logged + +![dashboard-1](img/dashboard-1.png) + +The top IP Addresses is shown along with top paths and User Agents + +![dashboard-2](img/dashboard-2.png) + +### Retrieving Dashboard Path + +Check server startup logs or get the secret with + +```bash +kubectl get secret krawl-server -n krawl-system \ + -o jsonpath='{.data.dashboard-path}' | base64 -d && echo +``` + +## 🤝 Contributing + +Contributions welcome! Please: +1. Fork the repository +2. Create a feature branch +3. Make your changes +4. Submit a pull request (explain the changes!) + + +
+ +## ⚠️ Disclaimer + +**This is a deception/honeypot system.** +Deploy in isolated environments and monitor carefully for security events. +Use responsibly and in compliance with applicable laws and regulations. + +## Star History +Star History Chart diff --git a/helm/Chart.yaml b/helm/Chart.yaml index 288225d..028a9f3 100644 --- a/helm/Chart.yaml +++ b/helm/Chart.yaml @@ -2,7 +2,7 @@ apiVersion: v2 name: krawl-chart description: A Helm chart for Krawl honeypot server type: application -version: 0.1.3 +version: 0.1.4 appVersion: 0.1.6 keywords: - honeypot diff --git a/src/config.py b/src/config.py index 1a9dbc2..df83380 100644 --- a/src/config.py +++ b/src/config.py @@ -111,13 +111,40 @@ class Config: attack_urls_threshold=analyzer.get('attack_urls_threshold', 1) ) +def __get_env_from_config(config: str) -> str: + + env = config.upper().replace('.', '_').replace('-', '__').replace(' ', '_') + + return f'KRAWL_{env}' + +def override_config_from_env(config: Config = None): + """Initialize configuration from environment variables""" + + for field in config.__dataclass_fields__: + + env_var = __get_env_from_config(field) + if env_var in os.environ: + field_type = config.__dataclass_fields__[field].type + env_value = os.environ[env_var] + if field_type == int: + setattr(config, field, int(env_value)) + elif field_type == float: + setattr(config, field, float(env_value)) + elif field_type == Tuple[int, int]: + parts = env_value.split(',') + if len(parts) == 2: + setattr(config, field, (int(parts[0]), int(parts[1]))) + else: + setattr(config, field, env_value) _config_instance = None - def get_config() -> Config: """Get the singleton Config instance""" global _config_instance if _config_instance is None: _config_instance = Config.from_yaml() - return _config_instance + + override_config_from_env(_config_instance) + + return _config_instance \ No newline at end of file