Merge pull request #127 from BlessedRebuS/feat/release-1.2

Feat/release 1.2
2026-03-10 11:31:18 +01:00
parent 56345974c8 da3ffd64c9
commit 1b8dc53952
22 changed files with 355 additions and 45 deletions
--- a/README.md
+++ b/README.md
@@ -39,11 +39,11 @@
 - [Demo](#demo)
 - [What is Krawl?](#what-is-krawl)
 - [Krawl Dashboard](#krawl-dashboard)
- [Installation](#-installation)
+- [Quickstart](#quickstart)
  - [Docker Run](#docker-run)
  - [Docker Compose](#docker-compose)
  - [Kubernetes](#kubernetes)
-  - [Local (Python)](#local-python)
+  - [Uvicorn (Python)](#uvicorn-python)
 - [Configuration](#configuration)
  - [config.yaml](#configuration-via-configyaml)
  - [Environment Variables](#configuration-via-enviromental-variables)
@@ -51,7 +51,7 @@
 - [IP Reputation](#ip-reputation)
 - [Forward Server Header](#forward-server-header)
 - [Additional Documentation](#additional-documentation)
- [Contributing](#-contributing)
+- [Contributing](#contributing)
 ## Demo
 Tip: crawl the `robots.txt` paths for additional fun
@@ -88,24 +88,29 @@ You can easily expose Krawl alongside your other services to shield them from we
 Krawl provides a comprehensive dashboard, accessible at a **random secret path** generated at startup or at a **custom path** configured via `KRAWL_DASHBOARD_SECRET_PATH`. This keeps the dashboard hidden from attackers scanning your honeypot.
-The dashboard is organized in three main tabs:
+The dashboard is organized in five tabs:
- **Overview** — High-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths.
+- **Overview**: high-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths.
 ![geoip](img/geoip_dashboard.png)
- **Attacks** — Detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables.
+- **Attacks**: detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables.
 ![attack_types](img/attack_types.png)
- **IP Insight** — In-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history.
+- **IP Insight**: in-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history.
 ![ipinsight](img/ip_insight_dashboard.png)
 Additionally, after authenticating with the dashboard password, two protected tabs become available:
 - **Tracked IPs**: maintain a watchlist of IP addresses you want to monitor over time.
 - **IP Banlist**: manage IP bans, view detected attackers, and export the banlist in raw or IPTables format.
 For more details, see the [Dashboard documentation](docs/dashboard.md).
-## 🚀 Installation
+## Quickstart
 ### Docker Run
@@ -139,6 +144,7 @@ services:
    environment:
      - CONFIG_LOCATION=config.yaml
      - TZ=Europe/Rome
      # - KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard"
      # - KRAWL_DASHBOARD_PASSWORD=my-secret-password
    volumes:
      - ./config.yaml:/app/config.yaml:ro
@@ -166,7 +172,7 @@ docker-compose down
 ### Kubernetes
 **Krawl is also available natively on Kubernetes**. Installation can be done either [via manifest](kubernetes/README.md) or [using the helm chart](helm/README.md).
-### Python + Uvicorn
+### Uvicorn (Python)
 Run Krawl directly with Python (suggested version 13) and uvicorn for local development or testing:
@@ -307,7 +313,7 @@ location / {
 | [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings |
 | [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard |
-## 🤝 Contributing
+## Contributing
 Contributions welcome! Please:
 1. Fork the repository
--- a/docs/dashboard.md
+++ b/docs/dashboard.md
@@ -2,20 +2,182 @@
 Access the dashboard at `http://<server-ip>:<port>/<dashboard-path>`
-The dashboard shows:
+The Krawl dashboard is a single-page application with **5 tabs**: Overview, Attacks, IP Insight, Tracked IPs, and IP Banlist. The last two tabs are only visible after authenticating with the dashboard password.
 - Total and unique accesses
 - Suspicious activity and attack detection
 - Top IPs, paths, user-agents and GeoIP localization
 - Real-time monitoring
-The attackers' access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged.
+---
-Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website.
+## Overview
-![dashboard-1](../img/dashboard-1.png)
+The default landing page provides a high-level summary of all traffic and suspicious activity detected by Krawl.
-The top IP Addresses is shown along with top paths and User Agents
+### Stats Cards
-![dashboard-2](../img/dashboard-2.png)
+Seven metric cards are displayed at the top:
-![dashboard-3](../img/dashboard-3.png)
+- **Total Accesses** — total number of requests received
 - **Unique IPs** — distinct IP addresses observed
 - **Unique Paths** — distinct request paths
 - **Suspicious Accesses** — requests flagged as suspicious
 - **Honeypot Caught** — requests that hit honeypot endpoints
 - **Credentials Captured** — login attempts captured by the honeypot
 - **Unique Attackers** — distinct IPs classified as attackers
 ### Search
 A real-time search bar lets you search across attacks, IPs, patterns, and locations. Results are loaded dynamically as you type.
 ### IP Origins Map
 An interactive world map (powered by Leaflet) displays the geolocation of top IP addresses. You can filter by category:
 - Attackers
 - Bad Crawlers
 - Good Crawlers
 - Regular Users
 - Unknown
 The number of displayed IPs is configurable (top 10, 100, 1,000, or all).
 ![Overview — Stats and Map](../img/geoip_dashboard.png)
 ### Recent Suspicious Activity
 A table showing the last 10 suspicious requests with IP address, path, user-agent, and timestamp. Each entry provides actions to view the raw HTTP request or inspect the IP in detail.
 ### Top IP Addresses
 A paginated, sortable table ranking IPs by access count. Each IP shows its category badge and can be clicked to expand inline details or open the IP Insight tab.
 ### Top Paths
 A paginated table of the most accessed HTTP paths and their request counts.
 ### Top User-Agents
 A paginated table of the most common user-agent strings with their frequency.
 ![Overview — Tables](../img/overview_tables_dashboard.png)
 ---
 ## Attacks
 The Attacks tab focuses on detected malicious activity, attack patterns, and captured credentials.
 ### Attackers by Total Requests
 A paginated table listing all detected attackers ranked by total requests. Columns include IP, total requests, first seen, last seen, and location. Sortable by multiple fields.
 ![Attacks — Attackers and Credentials](../img/top_attackers_dashboard.png)
 ### Captured Credentials
 A table of usernames and passwords captured from honeypot login forms, with timestamps. Useful for analyzing common credential stuffing patterns.
 ### Honeypot Triggers by IP
 Shows which IPs accessed honeypot endpoints and how many times, sorted by trigger count.
 ### Detected Attack Types
 A detailed table of individual attack detections showing IP, path, attack type classifications, user-agent, and timestamp. Each entry can be expanded to view the raw HTTP request.
 ### Most Recurring Attack Types
 A Chart.js visualization showing the frequency distribution of detected attack categories (e.g., SQL injection, path traversal, XSS).
 ### Most Recurring Attack Patterns
 A paginated table of specific attack patterns and their occurrence counts across all traffic.
 ![Attacks — Attack Types and Patterns](../img/attack_types_dashboard.png)
 ---
 ## IP Insight
 The IP Insight tab provides a deep-dive view for a single IP address. It is activated by clicking "Inspect IP" from any table in the dashboard.
 ### IP Information Card
 Displays comprehensive details about the selected IP:
 - **Activity** — total requests, first seen, last seen, last analysis timestamp
 - **Geo & Network** — location, region, timezone, ISP, ASN, reverse DNS
 - **Category** — classification badge (Attacker, Good Crawler, Bad Crawler, Regular User, Unknown)
 ### Ban & Track Actions
 When authenticated, admin actions are available:
 - **Ban/Unban** — immediately add or remove the IP from the banlist
 - **Track/Untrack** — add the IP to your watchlist for ongoing monitoring
 ### Blocklist Memberships
 Shows which threat intelligence blocklists the IP appears on, providing external reputation context.
 ### Access Logs
 A filtered view of all requests made by this specific IP, with full request details.
 ![IP Insight — Detail View](../img/ip_insight_dashboard.png)
 ---
 ## Tracked IPs
 > Requires authentication with the dashboard password.
 The Tracked IPs tab lets you maintain a watchlist of IP addresses you want to monitor over time.
 ### Track New IP
 A form to add any IP address to your tracking list for ongoing observation.
 ### Currently Tracked IPs
 A paginated table of all manually tracked IPs, with the option to untrack each one.
 ![Tracked IPs](../img/tracked_ips_dashboard.png)
 ---
 ## IP Banlist
 > Requires authentication with the dashboard password.
 The IP Banlist tab provides tools for managing IP bans. Bans are exported every 5 minutes.
 ### Force Ban IP
 A form to immediately ban any IP address by entering it manually.
 ### Detected Attackers
 A paginated list of all IPs detected as attackers, with quick-ban actions for each entry.
 ![IP Banlist — Detected](../img/banlist_attackers_dashboard.png)
 ### Active Ban Overrides
 A table of currently active manual ban overrides, with options to unban or reset the override status for each IP.
 ![IP Banlist — Overrides](../img/banlist_overrides_dashboard.png)
 ### Export Banlist
 A dropdown menu to download the current banlist in two formats:
 - **Raw IPs List** — plain text, one IP per line
 - **IPTables Rules** — ready-to-use firewall rules
 ---
 ## Authentication
 The dashboard uses session-based authentication with secure HTTP-only cookies. Protected features (Tracked IPs, IP Banlist, ban/track actions) require entering the dashboard password. The login includes brute-force protection with IP-based rate limiting and exponential backoff.
 Click the lock icon in the top-right corner of the navigation bar to authenticate or log out.
 ![Authentication Modal](../img/auth_prompt.png)
--- a/helm/Chart.yaml
+++ b/helm/Chart.yaml
@@ -2,8 +2,8 @@ apiVersion: v2
 name: krawl-chart
 description: A Helm chart for Krawl honeypot server
 type: application
-version: 1.1.7
+version: 1.2.0
-appVersion: 1.1.7
+appVersion: 1.2.0
 keywords:
  - honeypot
  - security
--- a/helm/README.md
+++ b/helm/README.md
@@ -14,7 +14,7 @@ A Helm chart for deploying the Krawl honeypot application on Kubernetes.
 ```bash
 helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \
-  --version 1.1.3 \
+  --version 1.2.0 \
  --namespace krawl-system \
  --create-namespace \
  -f values.yaml  # optional
@@ -170,7 +170,7 @@ kubectl get secret krawl-server -n krawl-system \
 You can override individual values with `--set` without a values file:
 ```bash
-helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 \
+helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.2.0 \
  --set ingress.hosts[0].host=honeypot.example.com \
  --set config.canary.token_url=https://canarytokens.com/your-token
 ```
@@ -178,7 +178,7 @@ helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 \
 ## Upgrading
 ```bash
-helm upgrade krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 -f values.yaml
+helm upgrade krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.2.0 -f values.yaml
 ```
 ## Uninstalling
--- a/helm/values.yaml
+++ b/helm/values.yaml
@@ -3,7 +3,7 @@ replicaCount: 1
 image:
  repository: ghcr.io/blessedrebus/krawl
  pullPolicy: Always
-  tag: "1.1.3"
+  tag: "1.2.0"
 imagePullSecrets: []
 nameOverride: "krawl"
--- a/img/attack_types_dashboard.png
+++ b/img/attack_types_dashboard.png
--- a/img/auth_prompt.png
+++ b/img/auth_prompt.png
--- a/img/banlist_attackers_dashboard.png
+++ b/img/banlist_attackers_dashboard.png
--- a/img/banlist_overrides_dashboard.png
+++ b/img/banlist_overrides_dashboard.png
--- a/img/dashboard-1.png
+++ b/img/dashboard-1.png
--- a/img/dashboard-2.png
+++ b/img/dashboard-2.png
--- a/img/dashboard-3.png
+++ b/img/dashboard-3.png
--- a/img/database.png
+++ b/img/database.png
--- a/img/geoip_dashboard.png
+++ b/img/geoip_dashboard.png
--- a/img/overview_tables_dashboard.png
+++ b/img/overview_tables_dashboard.png
--- a/img/top_attackers_dashboard.png
+++ b/img/top_attackers_dashboard.png
--- a/img/tracked_ips_dashboard.png
+++ b/img/tracked_ips_dashboard.png
--- a/src/dashboard_cache.py
+++ b/src/dashboard_cache.py
@@ -0,0 +1,32 @@
 """
 In-memory cache for dashboard Overview data.
 A background task periodically refreshes this cache so the dashboard
 serves pre-computed data instantly instead of hitting SQLite cold.
 Memory footprint is fixed — each key is overwritten on every refresh.
 """
 import threading
 from typing import Any, Dict, Optional
 _lock = threading.Lock()
 _cache: Dict[str, Any] = {}
 def get_cached(key: str) -> Optional[Any]:
    """Get a value from the dashboard cache."""
    with _lock:
        return _cache.get(key)
 def set_cached(key: str, value: Any) -> None:
    """Set a value in the dashboard cache."""
    with _lock:
        _cache[key] = value
 def is_warm() -> bool:
    """Check if the cache has been populated at least once."""
    with _lock:
        return "stats" in _cache
--- a/src/routes/api.py
+++ b/src/routes/api.py
@@ -18,6 +18,7 @@ from pydantic import BaseModel
 from dependencies import get_db, get_client_ip
 from logger import get_app_logger
 from dashboard_cache import get_cached, is_warm
 # Server-side session token store (valid tokens for authenticated sessions)
 _auth_tokens: set = set()
@@ -249,10 +250,22 @@ async def all_ips(
    sort_by: str = Query("total_requests"),
    sort_order: str = Query("desc"),
 ):
    db = get_db()
    page = max(1, page)
    page_size = min(max(1, page_size), 10000)
    # Serve from cache on default map request (top 100 IPs)
    if (
        page == 1
        and page_size == 100
        and sort_by == "total_requests"
        and sort_order == "desc"
        and is_warm()
    ):
        cached = get_cached("map_ips")
        if cached:
            return JSONResponse(content=cached, headers=_no_cache_headers())
    db = get_db()
    try:
        result = db.get_all_ips_paginated(
            page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order
--- a/src/routes/dashboard.py
+++ b/src/routes/dashboard.py
@@ -10,6 +10,7 @@ from fastapi.responses import JSONResponse
 from logger import get_app_logger
 from dependencies import get_db, get_templates
 from dashboard_cache import get_cached, is_warm
 router = APIRouter()
@@ -17,17 +18,19 @@ router = APIRouter()
@router.get("")
@router.get("/")
 async def dashboard_page(request: Request):
    db = get_db()
    config = request.app.state.config
    dashboard_path = "/" + config.dashboard_secret_path.lstrip("/")
-    # Get initial data for server-rendered sections
+    # Serve from pre-computed cache when available, fall back to live queries
-    stats = db.get_dashboard_counts()
+    if is_warm():
-    suspicious = db.get_recent_suspicious(limit=10)
+        stats = get_cached("stats")
-
+        suspicious = get_cached("suspicious")
-    # Get credential count for the stats card
+    else:
-    cred_result = db.get_credentials_paginated(page=1, page_size=1)
+        db = get_db()
-    stats["credential_count"] = cred_result["pagination"]["total"]
+        stats = db.get_dashboard_counts()
        suspicious = db.get_recent_suspicious(limit=10)
        cred_result = db.get_credentials_paginated(page=1, page_size=1)
        stats["credential_count"] = cred_result["pagination"]["total"]
    templates = get_templates()
    return templates.TemplateResponse(
--- a/src/routes/htmx.py
+++ b/src/routes/htmx.py
@@ -10,6 +10,7 @@ from fastapi.responses import HTMLResponse
 from dependencies import get_db, get_templates
 from routes.api import verify_auth
 from dashboard_cache import get_cached, is_warm
 router = APIRouter()
@@ -58,10 +59,19 @@ async def htmx_top_ips(
    sort_by: str = Query("count"),
    sort_order: str = Query("desc"),
 ):
-    db = get_db()
+    # Serve from cache on default first-page request
-    result = db.get_top_ips_paginated(
+    cached = (
-        page=max(1, page), page_size=8, sort_by=sort_by, sort_order=sort_order
+        get_cached("top_ips")
        if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
        else None
    )
    if cached:
        result = cached
    else:
        db = get_db()
        result = db.get_top_ips_paginated(
            page=max(1, page), page_size=8, sort_by=sort_by, sort_order=sort_order
        )
    templates = get_templates()
    return templates.TemplateResponse(
@@ -87,10 +97,18 @@ async def htmx_top_paths(
    sort_by: str = Query("count"),
    sort_order: str = Query("desc"),
 ):
-    db = get_db()
+    cached = (
-    result = db.get_top_paths_paginated(
+        get_cached("top_paths")
-        page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
+        if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
        else None
    )
    if cached:
        result = cached
    else:
        db = get_db()
        result = db.get_top_paths_paginated(
            page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
        )
    templates = get_templates()
    return templates.TemplateResponse(
@@ -116,10 +134,18 @@ async def htmx_top_ua(
    sort_by: str = Query("count"),
    sort_order: str = Query("desc"),
 ):
-    db = get_db()
+    cached = (
-    result = db.get_top_user_agents_paginated(
+        get_cached("top_ua")
-        page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
+        if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
        else None
    )
    if cached:
        result = cached
    else:
        db = get_db()
        result = db.get_top_user_agents_paginated(
            page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
        )
    templates = get_templates()
    return templates.TemplateResponse(
--- a/src/tasks/dashboard_warmup.py
+++ b/src/tasks/dashboard_warmup.py
@@ -0,0 +1,68 @@
 # tasks/dashboard_warmup.py
 """
 Pre-computes all Overview tab data and stores it in the in-memory cache.
 This keeps SQLite page buffers warm and lets the dashboard respond instantly.
 """
 from logger import get_app_logger
 from database import get_database
 from dashboard_cache import set_cached
 app_logger = get_app_logger()
 # ----------------------
 # TASK CONFIG
 # ----------------------
 TASK_CONFIG = {
    "name": "dashboard-warmup",
    "cron": "*/1 * * * *",
    "enabled": True,
    "run_when_loaded": True,
 }
 # ----------------------
 # TASK LOGIC
 # ----------------------
 def main():
    """
    Refresh the in-memory dashboard cache with current Overview data.
    TasksMaster will call this function based on the cron schedule.
    """
    task_name = TASK_CONFIG.get("name")
    app_logger.info(f"[Background Task] {task_name} starting...")
    try:
        db = get_database()
        # --- Server-rendered data (stats cards + suspicious table) ---
        stats = db.get_dashboard_counts()
        cred_result = db.get_credentials_paginated(page=1, page_size=1)
        stats["credential_count"] = cred_result["pagination"]["total"]
        suspicious = db.get_recent_suspicious(limit=10)
        # --- HTMX Overview tables (first page, default sort) ---
        top_ips = db.get_top_ips_paginated(page=1, page_size=8)
        top_ua = db.get_top_user_agents_paginated(page=1, page_size=5)
        top_paths = db.get_top_paths_paginated(page=1, page_size=5)
        # --- Map data (default: top 100 IPs by total_requests) ---
        map_ips = db.get_all_ips_paginated(
            page=1, page_size=100, sort_by="total_requests", sort_order="desc"
        )
        # Store everything in the cache (overwrites previous values)
        set_cached("stats", stats)
        set_cached("suspicious", suspicious)
        set_cached("top_ips", top_ips)
        set_cached("top_ua", top_ua)
        set_cached("top_paths", top_paths)
        set_cached("map_ips", map_ips)
        app_logger.info(f"[Background Task] {task_name} cache refreshed successfully.")
    except Exception as e:
        app_logger.error(f"[Background Task] {task_name} failed: {e}")