diff --git a/README.md b/README.md index 87a9af9..e30561e 100644 --- a/README.md +++ b/README.md @@ -39,11 +39,11 @@ - [Demo](#demo) - [What is Krawl?](#what-is-krawl) - [Krawl Dashboard](#krawl-dashboard) -- [Installation](#-installation) +- [Quickstart](#quickstart) - [Docker Run](#docker-run) - [Docker Compose](#docker-compose) - [Kubernetes](#kubernetes) - - [Local (Python)](#local-python) + - [Uvicorn (Python)](#uvicorn-python) - [Configuration](#configuration) - [config.yaml](#configuration-via-configyaml) - [Environment Variables](#configuration-via-enviromental-variables) @@ -51,7 +51,7 @@ - [IP Reputation](#ip-reputation) - [Forward Server Header](#forward-server-header) - [Additional Documentation](#additional-documentation) -- [Contributing](#-contributing) +- [Contributing](#contributing) ## Demo Tip: crawl the `robots.txt` paths for additional fun @@ -88,24 +88,29 @@ You can easily expose Krawl alongside your other services to shield them from we Krawl provides a comprehensive dashboard, accessible at a **random secret path** generated at startup or at a **custom path** configured via `KRAWL_DASHBOARD_SECRET_PATH`. This keeps the dashboard hidden from attackers scanning your honeypot. -The dashboard is organized in three main tabs: +The dashboard is organized in five tabs: -- **Overview** — High-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths. +- **Overview**: high-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths. ![geoip](img/geoip_dashboard.png) -- **Attacks** — Detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables. +- **Attacks**: detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables. ![attack_types](img/attack_types.png) -- **IP Insight** — In-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history. +- **IP Insight**: in-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history. ![ipinsight](img/ip_insight_dashboard.png) +Additionally, after authenticating with the dashboard password, two protected tabs become available: + +- **Tracked IPs**: maintain a watchlist of IP addresses you want to monitor over time. +- **IP Banlist**: manage IP bans, view detected attackers, and export the banlist in raw or IPTables format. + For more details, see the [Dashboard documentation](docs/dashboard.md). -## 🚀 Installation +## Quickstart ### Docker Run @@ -139,6 +144,7 @@ services: environment: - CONFIG_LOCATION=config.yaml - TZ=Europe/Rome + # - KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard" # - KRAWL_DASHBOARD_PASSWORD=my-secret-password volumes: - ./config.yaml:/app/config.yaml:ro @@ -166,7 +172,7 @@ docker-compose down ### Kubernetes **Krawl is also available natively on Kubernetes**. Installation can be done either [via manifest](kubernetes/README.md) or [using the helm chart](helm/README.md). -### Python + Uvicorn +### Uvicorn (Python) Run Krawl directly with Python (suggested version 13) and uvicorn for local development or testing: @@ -307,7 +313,7 @@ location / { | [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings | | [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard | -## 🤝 Contributing +## Contributing Contributions welcome! Please: 1. Fork the repository diff --git a/docs/dashboard.md b/docs/dashboard.md index ace7955..0c5b52f 100644 --- a/docs/dashboard.md +++ b/docs/dashboard.md @@ -2,20 +2,182 @@ Access the dashboard at `http://:/` -The dashboard shows: -- Total and unique accesses -- Suspicious activity and attack detection -- Top IPs, paths, user-agents and GeoIP localization -- Real-time monitoring +The Krawl dashboard is a single-page application with **5 tabs**: Overview, Attacks, IP Insight, Tracked IPs, and IP Banlist. The last two tabs are only visible after authenticating with the dashboard password. -The attackers' access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged. +--- -Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website. +## Overview -![dashboard-1](../img/dashboard-1.png) +The default landing page provides a high-level summary of all traffic and suspicious activity detected by Krawl. -The top IP Addresses is shown along with top paths and User Agents +### Stats Cards -![dashboard-2](../img/dashboard-2.png) +Seven metric cards are displayed at the top: -![dashboard-3](../img/dashboard-3.png) +- **Total Accesses** — total number of requests received +- **Unique IPs** — distinct IP addresses observed +- **Unique Paths** — distinct request paths +- **Suspicious Accesses** — requests flagged as suspicious +- **Honeypot Caught** — requests that hit honeypot endpoints +- **Credentials Captured** — login attempts captured by the honeypot +- **Unique Attackers** — distinct IPs classified as attackers + +### Search + +A real-time search bar lets you search across attacks, IPs, patterns, and locations. Results are loaded dynamically as you type. + +### IP Origins Map + +An interactive world map (powered by Leaflet) displays the geolocation of top IP addresses. You can filter by category: + +- Attackers +- Bad Crawlers +- Good Crawlers +- Regular Users +- Unknown + +The number of displayed IPs is configurable (top 10, 100, 1,000, or all). + +![Overview — Stats and Map](../img/geoip_dashboard.png) + +### Recent Suspicious Activity + +A table showing the last 10 suspicious requests with IP address, path, user-agent, and timestamp. Each entry provides actions to view the raw HTTP request or inspect the IP in detail. + +### Top IP Addresses + +A paginated, sortable table ranking IPs by access count. Each IP shows its category badge and can be clicked to expand inline details or open the IP Insight tab. + +### Top Paths + +A paginated table of the most accessed HTTP paths and their request counts. + +### Top User-Agents + +A paginated table of the most common user-agent strings with their frequency. + +![Overview — Tables](../img/overview_tables_dashboard.png) + +--- + +## Attacks + +The Attacks tab focuses on detected malicious activity, attack patterns, and captured credentials. + +### Attackers by Total Requests + +A paginated table listing all detected attackers ranked by total requests. Columns include IP, total requests, first seen, last seen, and location. Sortable by multiple fields. + +![Attacks — Attackers and Credentials](../img/top_attackers_dashboard.png) + +### Captured Credentials + +A table of usernames and passwords captured from honeypot login forms, with timestamps. Useful for analyzing common credential stuffing patterns. + +### Honeypot Triggers by IP + +Shows which IPs accessed honeypot endpoints and how many times, sorted by trigger count. + +### Detected Attack Types + +A detailed table of individual attack detections showing IP, path, attack type classifications, user-agent, and timestamp. Each entry can be expanded to view the raw HTTP request. + +### Most Recurring Attack Types + +A Chart.js visualization showing the frequency distribution of detected attack categories (e.g., SQL injection, path traversal, XSS). + +### Most Recurring Attack Patterns + +A paginated table of specific attack patterns and their occurrence counts across all traffic. + +![Attacks — Attack Types and Patterns](../img/attack_types_dashboard.png) + +--- + +## IP Insight + +The IP Insight tab provides a deep-dive view for a single IP address. It is activated by clicking "Inspect IP" from any table in the dashboard. + +### IP Information Card + +Displays comprehensive details about the selected IP: + +- **Activity** — total requests, first seen, last seen, last analysis timestamp +- **Geo & Network** — location, region, timezone, ISP, ASN, reverse DNS +- **Category** — classification badge (Attacker, Good Crawler, Bad Crawler, Regular User, Unknown) + +### Ban & Track Actions + +When authenticated, admin actions are available: + +- **Ban/Unban** — immediately add or remove the IP from the banlist +- **Track/Untrack** — add the IP to your watchlist for ongoing monitoring + +### Blocklist Memberships + +Shows which threat intelligence blocklists the IP appears on, providing external reputation context. + +### Access Logs + +A filtered view of all requests made by this specific IP, with full request details. + +![IP Insight — Detail View](../img/ip_insight_dashboard.png) + +--- + +## Tracked IPs + +> Requires authentication with the dashboard password. + +The Tracked IPs tab lets you maintain a watchlist of IP addresses you want to monitor over time. + +### Track New IP + +A form to add any IP address to your tracking list for ongoing observation. + +### Currently Tracked IPs + +A paginated table of all manually tracked IPs, with the option to untrack each one. + +![Tracked IPs](../img/tracked_ips_dashboard.png) + +--- + +## IP Banlist + +> Requires authentication with the dashboard password. + +The IP Banlist tab provides tools for managing IP bans. Bans are exported every 5 minutes. + +### Force Ban IP + +A form to immediately ban any IP address by entering it manually. + +### Detected Attackers + +A paginated list of all IPs detected as attackers, with quick-ban actions for each entry. + +![IP Banlist — Detected](../img/banlist_attackers_dashboard.png) + +### Active Ban Overrides + +A table of currently active manual ban overrides, with options to unban or reset the override status for each IP. + +![IP Banlist — Overrides](../img/banlist_overrides_dashboard.png) + +### Export Banlist + +A dropdown menu to download the current banlist in two formats: + +- **Raw IPs List** — plain text, one IP per line +- **IPTables Rules** — ready-to-use firewall rules + +--- + +## Authentication + +The dashboard uses session-based authentication with secure HTTP-only cookies. Protected features (Tracked IPs, IP Banlist, ban/track actions) require entering the dashboard password. The login includes brute-force protection with IP-based rate limiting and exponential backoff. + +Click the lock icon in the top-right corner of the navigation bar to authenticate or log out. + +![Authentication Modal](../img/auth_prompt.png) diff --git a/helm/Chart.yaml b/helm/Chart.yaml index b66537f..bf0a7bb 100644 --- a/helm/Chart.yaml +++ b/helm/Chart.yaml @@ -2,8 +2,8 @@ apiVersion: v2 name: krawl-chart description: A Helm chart for Krawl honeypot server type: application -version: 1.1.7 -appVersion: 1.1.7 +version: 1.2.0 +appVersion: 1.2.0 keywords: - honeypot - security diff --git a/helm/README.md b/helm/README.md index efdfeb1..8b9bf57 100644 --- a/helm/README.md +++ b/helm/README.md @@ -14,7 +14,7 @@ A Helm chart for deploying the Krawl honeypot application on Kubernetes. ```bash helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \ - --version 1.1.3 \ + --version 1.2.0 \ --namespace krawl-system \ --create-namespace \ -f values.yaml # optional @@ -170,7 +170,7 @@ kubectl get secret krawl-server -n krawl-system \ You can override individual values with `--set` without a values file: ```bash -helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 \ +helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.2.0 \ --set ingress.hosts[0].host=honeypot.example.com \ --set config.canary.token_url=https://canarytokens.com/your-token ``` @@ -178,7 +178,7 @@ helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 \ ## Upgrading ```bash -helm upgrade krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 -f values.yaml +helm upgrade krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.2.0 -f values.yaml ``` ## Uninstalling diff --git a/helm/values.yaml b/helm/values.yaml index 0c2e529..4788165 100644 --- a/helm/values.yaml +++ b/helm/values.yaml @@ -3,7 +3,7 @@ replicaCount: 1 image: repository: ghcr.io/blessedrebus/krawl pullPolicy: Always - tag: "1.1.3" + tag: "1.2.0" imagePullSecrets: [] nameOverride: "krawl" diff --git a/img/attack_types_dashboard.png b/img/attack_types_dashboard.png new file mode 100644 index 0000000..67b4223 Binary files /dev/null and b/img/attack_types_dashboard.png differ diff --git a/img/auth_prompt.png b/img/auth_prompt.png new file mode 100644 index 0000000..934b7e6 Binary files /dev/null and b/img/auth_prompt.png differ diff --git a/img/banlist_attackers_dashboard.png b/img/banlist_attackers_dashboard.png new file mode 100644 index 0000000..7e12d13 Binary files /dev/null and b/img/banlist_attackers_dashboard.png differ diff --git a/img/banlist_overrides_dashboard.png b/img/banlist_overrides_dashboard.png new file mode 100644 index 0000000..e630e86 Binary files /dev/null and b/img/banlist_overrides_dashboard.png differ diff --git a/img/dashboard-1.png b/img/dashboard-1.png deleted file mode 100644 index 4479914..0000000 Binary files a/img/dashboard-1.png and /dev/null differ diff --git a/img/dashboard-2.png b/img/dashboard-2.png deleted file mode 100644 index e6a208d..0000000 Binary files a/img/dashboard-2.png and /dev/null differ diff --git a/img/dashboard-3.png b/img/dashboard-3.png deleted file mode 100644 index e7b24df..0000000 Binary files a/img/dashboard-3.png and /dev/null differ diff --git a/img/database.png b/img/database.png deleted file mode 100644 index fea8b4f..0000000 Binary files a/img/database.png and /dev/null differ diff --git a/img/geoip_dashboard.png b/img/geoip_dashboard.png index 5a4f389..e046ffd 100644 Binary files a/img/geoip_dashboard.png and b/img/geoip_dashboard.png differ diff --git a/img/overview_tables_dashboard.png b/img/overview_tables_dashboard.png new file mode 100644 index 0000000..5557101 Binary files /dev/null and b/img/overview_tables_dashboard.png differ diff --git a/img/top_attackers_dashboard.png b/img/top_attackers_dashboard.png new file mode 100644 index 0000000..b86c490 Binary files /dev/null and b/img/top_attackers_dashboard.png differ diff --git a/img/tracked_ips_dashboard.png b/img/tracked_ips_dashboard.png new file mode 100644 index 0000000..13e991e Binary files /dev/null and b/img/tracked_ips_dashboard.png differ diff --git a/src/dashboard_cache.py b/src/dashboard_cache.py new file mode 100644 index 0000000..c0dcd7f --- /dev/null +++ b/src/dashboard_cache.py @@ -0,0 +1,32 @@ +""" +In-memory cache for dashboard Overview data. + +A background task periodically refreshes this cache so the dashboard +serves pre-computed data instantly instead of hitting SQLite cold. + +Memory footprint is fixed — each key is overwritten on every refresh. +""" + +import threading +from typing import Any, Dict, Optional + +_lock = threading.Lock() +_cache: Dict[str, Any] = {} + + +def get_cached(key: str) -> Optional[Any]: + """Get a value from the dashboard cache.""" + with _lock: + return _cache.get(key) + + +def set_cached(key: str, value: Any) -> None: + """Set a value in the dashboard cache.""" + with _lock: + _cache[key] = value + + +def is_warm() -> bool: + """Check if the cache has been populated at least once.""" + with _lock: + return "stats" in _cache diff --git a/src/routes/api.py b/src/routes/api.py index 11ee8ce..80873fc 100644 --- a/src/routes/api.py +++ b/src/routes/api.py @@ -18,6 +18,7 @@ from pydantic import BaseModel from dependencies import get_db, get_client_ip from logger import get_app_logger +from dashboard_cache import get_cached, is_warm # Server-side session token store (valid tokens for authenticated sessions) _auth_tokens: set = set() @@ -249,10 +250,22 @@ async def all_ips( sort_by: str = Query("total_requests"), sort_order: str = Query("desc"), ): - db = get_db() page = max(1, page) page_size = min(max(1, page_size), 10000) + # Serve from cache on default map request (top 100 IPs) + if ( + page == 1 + and page_size == 100 + and sort_by == "total_requests" + and sort_order == "desc" + and is_warm() + ): + cached = get_cached("map_ips") + if cached: + return JSONResponse(content=cached, headers=_no_cache_headers()) + + db = get_db() try: result = db.get_all_ips_paginated( page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order diff --git a/src/routes/dashboard.py b/src/routes/dashboard.py index 081336c..37f9d51 100644 --- a/src/routes/dashboard.py +++ b/src/routes/dashboard.py @@ -10,6 +10,7 @@ from fastapi.responses import JSONResponse from logger import get_app_logger from dependencies import get_db, get_templates +from dashboard_cache import get_cached, is_warm router = APIRouter() @@ -17,17 +18,19 @@ router = APIRouter() @router.get("") @router.get("/") async def dashboard_page(request: Request): - db = get_db() config = request.app.state.config dashboard_path = "/" + config.dashboard_secret_path.lstrip("/") - # Get initial data for server-rendered sections - stats = db.get_dashboard_counts() - suspicious = db.get_recent_suspicious(limit=10) - - # Get credential count for the stats card - cred_result = db.get_credentials_paginated(page=1, page_size=1) - stats["credential_count"] = cred_result["pagination"]["total"] + # Serve from pre-computed cache when available, fall back to live queries + if is_warm(): + stats = get_cached("stats") + suspicious = get_cached("suspicious") + else: + db = get_db() + stats = db.get_dashboard_counts() + suspicious = db.get_recent_suspicious(limit=10) + cred_result = db.get_credentials_paginated(page=1, page_size=1) + stats["credential_count"] = cred_result["pagination"]["total"] templates = get_templates() return templates.TemplateResponse( diff --git a/src/routes/htmx.py b/src/routes/htmx.py index 98373e7..7452e8a 100644 --- a/src/routes/htmx.py +++ b/src/routes/htmx.py @@ -10,6 +10,7 @@ from fastapi.responses import HTMLResponse from dependencies import get_db, get_templates from routes.api import verify_auth +from dashboard_cache import get_cached, is_warm router = APIRouter() @@ -58,10 +59,19 @@ async def htmx_top_ips( sort_by: str = Query("count"), sort_order: str = Query("desc"), ): - db = get_db() - result = db.get_top_ips_paginated( - page=max(1, page), page_size=8, sort_by=sort_by, sort_order=sort_order + # Serve from cache on default first-page request + cached = ( + get_cached("top_ips") + if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm()) + else None ) + if cached: + result = cached + else: + db = get_db() + result = db.get_top_ips_paginated( + page=max(1, page), page_size=8, sort_by=sort_by, sort_order=sort_order + ) templates = get_templates() return templates.TemplateResponse( @@ -87,10 +97,18 @@ async def htmx_top_paths( sort_by: str = Query("count"), sort_order: str = Query("desc"), ): - db = get_db() - result = db.get_top_paths_paginated( - page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order + cached = ( + get_cached("top_paths") + if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm()) + else None ) + if cached: + result = cached + else: + db = get_db() + result = db.get_top_paths_paginated( + page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order + ) templates = get_templates() return templates.TemplateResponse( @@ -116,10 +134,18 @@ async def htmx_top_ua( sort_by: str = Query("count"), sort_order: str = Query("desc"), ): - db = get_db() - result = db.get_top_user_agents_paginated( - page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order + cached = ( + get_cached("top_ua") + if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm()) + else None ) + if cached: + result = cached + else: + db = get_db() + result = db.get_top_user_agents_paginated( + page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order + ) templates = get_templates() return templates.TemplateResponse( diff --git a/src/tasks/dashboard_warmup.py b/src/tasks/dashboard_warmup.py new file mode 100644 index 0000000..3734864 --- /dev/null +++ b/src/tasks/dashboard_warmup.py @@ -0,0 +1,68 @@ +# tasks/dashboard_warmup.py + +""" +Pre-computes all Overview tab data and stores it in the in-memory cache. +This keeps SQLite page buffers warm and lets the dashboard respond instantly. +""" + +from logger import get_app_logger +from database import get_database +from dashboard_cache import set_cached + +app_logger = get_app_logger() + +# ---------------------- +# TASK CONFIG +# ---------------------- +TASK_CONFIG = { + "name": "dashboard-warmup", + "cron": "*/1 * * * *", + "enabled": True, + "run_when_loaded": True, +} + + +# ---------------------- +# TASK LOGIC +# ---------------------- +def main(): + """ + Refresh the in-memory dashboard cache with current Overview data. + TasksMaster will call this function based on the cron schedule. + """ + task_name = TASK_CONFIG.get("name") + app_logger.info(f"[Background Task] {task_name} starting...") + + try: + db = get_database() + + # --- Server-rendered data (stats cards + suspicious table) --- + stats = db.get_dashboard_counts() + + cred_result = db.get_credentials_paginated(page=1, page_size=1) + stats["credential_count"] = cred_result["pagination"]["total"] + + suspicious = db.get_recent_suspicious(limit=10) + + # --- HTMX Overview tables (first page, default sort) --- + top_ips = db.get_top_ips_paginated(page=1, page_size=8) + top_ua = db.get_top_user_agents_paginated(page=1, page_size=5) + top_paths = db.get_top_paths_paginated(page=1, page_size=5) + + # --- Map data (default: top 100 IPs by total_requests) --- + map_ips = db.get_all_ips_paginated( + page=1, page_size=100, sort_by="total_requests", sort_order="desc" + ) + + # Store everything in the cache (overwrites previous values) + set_cached("stats", stats) + set_cached("suspicious", suspicious) + set_cached("top_ips", top_ips) + set_cached("top_ua", top_ua) + set_cached("top_paths", top_paths) + set_cached("map_ips", map_ips) + + app_logger.info(f"[Background Task] {task_name} cache refreshed successfully.") + + except Exception as e: + app_logger.error(f"[Background Task] {task_name} failed: {e}")