Merge pull request #127 from BlessedRebuS/feat/release-1.2

Feat/release 1.2
2026-03-10 11:31:18 +01:00
parent 56345974c8 da3ffd64c9
commit 1b8dc53952
22 changed files with 355 additions and 45 deletions
--- a/README.md
+++ b/README.md
@@ -39,11 +39,11 @@
 - [Demo](#demo)
 - [What is Krawl?](#what-is-krawl)
 - [Krawl Dashboard](#krawl-dashboard)
- [Installation](#-installation)
+- [Quickstart](#quickstart)
  - [Docker Run](#docker-run)
  - [Docker Compose](#docker-compose)
  - [Kubernetes](#kubernetes)
-  - [Local (Python)](#local-python)
+  - [Uvicorn (Python)](#uvicorn-python)
 - [Configuration](#configuration)
  - [config.yaml](#configuration-via-configyaml)
  - [Environment Variables](#configuration-via-enviromental-variables)
@@ -51,7 +51,7 @@
 - [IP Reputation](#ip-reputation)
 - [Forward Server Header](#forward-server-header)
 - [Additional Documentation](#additional-documentation)
- [Contributing](#-contributing)
+- [Contributing](#contributing)

 ## Demo
 Tip: crawl the `robots.txt` paths for additional fun
@@ -88,24 +88,29 @@ You can easily expose Krawl alongside your other services to shield them from we

 Krawl provides a comprehensive dashboard, accessible at a **random secret path** generated at startup or at a **custom path** configured via `KRAWL_DASHBOARD_SECRET_PATH`. This keeps the dashboard hidden from attackers scanning your honeypot.

-The dashboard is organized in three main tabs:
+The dashboard is organized in five tabs:

- **Overview** — High-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths.
+- **Overview**: high-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths.

 ![geoip](img/geoip_dashboard.png)

- **Attacks** — Detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables.
+- **Attacks**: detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables.

 ![attack_types](img/attack_types.png)

- **IP Insight** — In-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history.
+- **IP Insight**: in-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history.

 ![ipinsight](img/ip_insight_dashboard.png)

+Additionally, after authenticating with the dashboard password, two protected tabs become available:
+
+- **Tracked IPs**: maintain a watchlist of IP addresses you want to monitor over time.
+- **IP Banlist**: manage IP bans, view detected attackers, and export the banlist in raw or IPTables format.
+
 For more details, see the [Dashboard documentation](docs/dashboard.md).


-## 🚀 Installation
+## Quickstart

 ### Docker Run

@@ -139,6 +144,7 @@ services:
    environment:
      - CONFIG_LOCATION=config.yaml
      - TZ=Europe/Rome
+      # - KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard"
      # - KRAWL_DASHBOARD_PASSWORD=my-secret-password
    volumes:
      - ./config.yaml:/app/config.yaml:ro
@@ -166,7 +172,7 @@ docker-compose down
 ### Kubernetes
 **Krawl is also available natively on Kubernetes**. Installation can be done either [via manifest](kubernetes/README.md) or [using the helm chart](helm/README.md).

-### Python + Uvicorn
+### Uvicorn (Python)

 Run Krawl directly with Python (suggested version 13) and uvicorn for local development or testing:

@@ -307,7 +313,7 @@ location / {
 | [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings |
 | [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard |

-## 🤝 Contributing
+## Contributing

 Contributions welcome! Please:
 1. Fork the repository
--- a/docs/dashboard.md
+++ b/docs/dashboard.md
@@ -2,20 +2,182 @@

 Access the dashboard at `http://<server-ip>:<port>/<dashboard-path>`

-The dashboard shows:
- Total and unique accesses
- Suspicious activity and attack detection
- Top IPs, paths, user-agents and GeoIP localization
- Real-time monitoring
+The Krawl dashboard is a single-page application with **5 tabs**: Overview, Attacks, IP Insight, Tracked IPs, and IP Banlist. The last two tabs are only visible after authenticating with the dashboard password.

-The attackers' access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged.
+---

-Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website.
+## Overview

-![dashboard-1](../img/dashboard-1.png)
+The default landing page provides a high-level summary of all traffic and suspicious activity detected by Krawl.

-The top IP Addresses is shown along with top paths and User Agents
+### Stats Cards

-![dashboard-2](../img/dashboard-2.png)
+Seven metric cards are displayed at the top:

-![dashboard-3](../img/dashboard-3.png)
+- **Total Accesses** — total number of requests received
+- **Unique IPs** — distinct IP addresses observed
+- **Unique Paths** — distinct request paths
+- **Suspicious Accesses** — requests flagged as suspicious
+- **Honeypot Caught** — requests that hit honeypot endpoints
+- **Credentials Captured** — login attempts captured by the honeypot
+- **Unique Attackers** — distinct IPs classified as attackers
+
+### Search
+
+A real-time search bar lets you search across attacks, IPs, patterns, and locations. Results are loaded dynamically as you type.
+
+### IP Origins Map
+
+An interactive world map (powered by Leaflet) displays the geolocation of top IP addresses. You can filter by category:
+
+- Attackers
+- Bad Crawlers
+- Good Crawlers
+- Regular Users
+- Unknown
+
+The number of displayed IPs is configurable (top 10, 100, 1,000, or all).
+
+![Overview — Stats and Map](../img/geoip_dashboard.png)
+
+### Recent Suspicious Activity
+
+A table showing the last 10 suspicious requests with IP address, path, user-agent, and timestamp. Each entry provides actions to view the raw HTTP request or inspect the IP in detail.
+
+### Top IP Addresses
+
+A paginated, sortable table ranking IPs by access count. Each IP shows its category badge and can be clicked to expand inline details or open the IP Insight tab.
+
+### Top Paths
+
+A paginated table of the most accessed HTTP paths and their request counts.
+
+### Top User-Agents
+
+A paginated table of the most common user-agent strings with their frequency.
+
+![Overview — Tables](../img/overview_tables_dashboard.png)
+
+---
+
+## Attacks
+
+The Attacks tab focuses on detected malicious activity, attack patterns, and captured credentials.
+
+### Attackers by Total Requests
+
+A paginated table listing all detected attackers ranked by total requests. Columns include IP, total requests, first seen, last seen, and location. Sortable by multiple fields.
+
+![Attacks — Attackers and Credentials](../img/top_attackers_dashboard.png)
+
+### Captured Credentials
+
+A table of usernames and passwords captured from honeypot login forms, with timestamps. Useful for analyzing common credential stuffing patterns.
+
+### Honeypot Triggers by IP
+
+Shows which IPs accessed honeypot endpoints and how many times, sorted by trigger count.
+
+### Detected Attack Types
+
+A detailed table of individual attack detections showing IP, path, attack type classifications, user-agent, and timestamp. Each entry can be expanded to view the raw HTTP request.
+
+### Most Recurring Attack Types
+
+A Chart.js visualization showing the frequency distribution of detected attack categories (e.g., SQL injection, path traversal, XSS).
+
+### Most Recurring Attack Patterns
+
+A paginated table of specific attack patterns and their occurrence counts across all traffic.
+
+![Attacks — Attack Types and Patterns](../img/attack_types_dashboard.png)
+
+---
+
+## IP Insight
+
+The IP Insight tab provides a deep-dive view for a single IP address. It is activated by clicking "Inspect IP" from any table in the dashboard.
+
+### IP Information Card
+
+Displays comprehensive details about the selected IP:
+
+- **Activity** — total requests, first seen, last seen, last analysis timestamp
+- **Geo & Network** — location, region, timezone, ISP, ASN, reverse DNS
+- **Category** — classification badge (Attacker, Good Crawler, Bad Crawler, Regular User, Unknown)
+
+### Ban & Track Actions
+
+When authenticated, admin actions are available:
+
+- **Ban/Unban** — immediately add or remove the IP from the banlist
+- **Track/Untrack** — add the IP to your watchlist for ongoing monitoring
+
+### Blocklist Memberships
+
+Shows which threat intelligence blocklists the IP appears on, providing external reputation context.
+
+### Access Logs
+
+A filtered view of all requests made by this specific IP, with full request details.
+
+![IP Insight — Detail View](../img/ip_insight_dashboard.png)
+
+---
+
+## Tracked IPs
+
+> Requires authentication with the dashboard password.
+
+The Tracked IPs tab lets you maintain a watchlist of IP addresses you want to monitor over time.
+
+### Track New IP
+
+A form to add any IP address to your tracking list for ongoing observation.
+
+### Currently Tracked IPs
+
+A paginated table of all manually tracked IPs, with the option to untrack each one.
+
+![Tracked IPs](../img/tracked_ips_dashboard.png)
+
+---
+
+## IP Banlist
+
+> Requires authentication with the dashboard password.
+
+The IP Banlist tab provides tools for managing IP bans. Bans are exported every 5 minutes.
+
+### Force Ban IP
+
+A form to immediately ban any IP address by entering it manually.
+
+### Detected Attackers
+
+A paginated list of all IPs detected as attackers, with quick-ban actions for each entry.
+
+![IP Banlist — Detected](../img/banlist_attackers_dashboard.png)
+
+### Active Ban Overrides
+
+A table of currently active manual ban overrides, with options to unban or reset the override status for each IP.
+
+![IP Banlist — Overrides](../img/banlist_overrides_dashboard.png)
+
+### Export Banlist
+
+A dropdown menu to download the current banlist in two formats:
+
+- **Raw IPs List** — plain text, one IP per line
+- **IPTables Rules** — ready-to-use firewall rules
+
+---
+
+## Authentication
+
+The dashboard uses session-based authentication with secure HTTP-only cookies. Protected features (Tracked IPs, IP Banlist, ban/track actions) require entering the dashboard password. The login includes brute-force protection with IP-based rate limiting and exponential backoff.
+
+Click the lock icon in the top-right corner of the navigation bar to authenticate or log out.
+
+![Authentication Modal](../img/auth_prompt.png)
--- a/helm/Chart.yaml
+++ b/helm/Chart.yaml
@@ -2,8 +2,8 @@ apiVersion: v2
 name: krawl-chart
 description: A Helm chart for Krawl honeypot server
 type: application
-version: 1.1.7
-appVersion: 1.1.7
+version: 1.2.0
+appVersion: 1.2.0
 keywords:
  - honeypot
  - security
--- a/helm/README.md
+++ b/helm/README.md
@@ -14,7 +14,7 @@ A Helm chart for deploying the Krawl honeypot application on Kubernetes.

 ```bash
 helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \
-  --version 1.1.3 \
+  --version 1.2.0 \
  --namespace krawl-system \
  --create-namespace \
  -f values.yaml  # optional
@@ -170,7 +170,7 @@ kubectl get secret krawl-server -n krawl-system \
 You can override individual values with `--set` without a values file:

 ```bash
-helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 \
+helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.2.0 \
  --set ingress.hosts[0].host=honeypot.example.com \
  --set config.canary.token_url=https://canarytokens.com/your-token
 ```
@@ -178,7 +178,7 @@ helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 \
 ## Upgrading

 ```bash
-helm upgrade krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 -f values.yaml
+helm upgrade krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.2.0 -f values.yaml
 ```

 ## Uninstalling
--- a/helm/values.yaml
+++ b/helm/values.yaml
@@ -3,7 +3,7 @@ replicaCount: 1
 image:
  repository: ghcr.io/blessedrebus/krawl
  pullPolicy: Always
-  tag: "1.1.3"
+  tag: "1.2.0"

 imagePullSecrets: []
 nameOverride: "krawl"
--- a/img/attack_types_dashboard.png
+++ b/img/attack_types_dashboard.png
--- a/img/auth_prompt.png
+++ b/img/auth_prompt.png
--- a/img/banlist_attackers_dashboard.png
+++ b/img/banlist_attackers_dashboard.png
--- a/img/banlist_overrides_dashboard.png
+++ b/img/banlist_overrides_dashboard.png
--- a/img/dashboard-1.png
+++ b/img/dashboard-1.png
--- a/img/dashboard-2.png
+++ b/img/dashboard-2.png
--- a/img/dashboard-3.png
+++ b/img/dashboard-3.png
--- a/img/database.png
+++ b/img/database.png
--- a/img/geoip_dashboard.png
+++ b/img/geoip_dashboard.png
--- a/img/overview_tables_dashboard.png
+++ b/img/overview_tables_dashboard.png
--- a/img/top_attackers_dashboard.png
+++ b/img/top_attackers_dashboard.png
--- a/img/tracked_ips_dashboard.png
+++ b/img/tracked_ips_dashboard.png
--- a/src/dashboard_cache.py
+++ b/src/dashboard_cache.py
@@ -0,0 +1,32 @@
+"""
+In-memory cache for dashboard Overview data.
+
+A background task periodically refreshes this cache so the dashboard
+serves pre-computed data instantly instead of hitting SQLite cold.
+
+Memory footprint is fixed — each key is overwritten on every refresh.
+"""
+
+import threading
+from typing import Any, Dict, Optional
+
+_lock = threading.Lock()
+_cache: Dict[str, Any] = {}
+
+
+def get_cached(key: str) -> Optional[Any]:
+    """Get a value from the dashboard cache."""
+    with _lock:
+        return _cache.get(key)
+
+
+def set_cached(key: str, value: Any) -> None:
+    """Set a value in the dashboard cache."""
+    with _lock:
+        _cache[key] = value
+
+
+def is_warm() -> bool:
+    """Check if the cache has been populated at least once."""
+    with _lock:
+        return "stats" in _cache
--- a/src/routes/api.py
+++ b/src/routes/api.py
@@ -18,6 +18,7 @@ from pydantic import BaseModel

 from dependencies import get_db, get_client_ip
 from logger import get_app_logger
+from dashboard_cache import get_cached, is_warm

 # Server-side session token store (valid tokens for authenticated sessions)
 _auth_tokens: set = set()
@@ -249,10 +250,22 @@ async def all_ips(
    sort_by: str = Query("total_requests"),
    sort_order: str = Query("desc"),
 ):
-    db = get_db()
    page = max(1, page)
    page_size = min(max(1, page_size), 10000)

+    # Serve from cache on default map request (top 100 IPs)
+    if (
+        page == 1
+        and page_size == 100
+        and sort_by == "total_requests"
+        and sort_order == "desc"
+        and is_warm()
+    ):
+        cached = get_cached("map_ips")
+        if cached:
+            return JSONResponse(content=cached, headers=_no_cache_headers())
+
+    db = get_db()
    try:
        result = db.get_all_ips_paginated(
            page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order
--- a/src/routes/dashboard.py
+++ b/src/routes/dashboard.py
@@ -10,6 +10,7 @@ from fastapi.responses import JSONResponse
 from logger import get_app_logger

 from dependencies import get_db, get_templates
+from dashboard_cache import get_cached, is_warm

 router = APIRouter()

@@ -17,15 +18,17 @@ router = APIRouter()
@router.get("")
@router.get("/")
 async def dashboard_page(request: Request):
-    db = get_db()
    config = request.app.state.config
    dashboard_path = "/" + config.dashboard_secret_path.lstrip("/")

-    # Get initial data for server-rendered sections
+    # Serve from pre-computed cache when available, fall back to live queries
+    if is_warm():
+        stats = get_cached("stats")
+        suspicious = get_cached("suspicious")
+    else:
+        db = get_db()
        stats = db.get_dashboard_counts()
        suspicious = db.get_recent_suspicious(limit=10)
-
-    # Get credential count for the stats card
        cred_result = db.get_credentials_paginated(page=1, page_size=1)
        stats["credential_count"] = cred_result["pagination"]["total"]

--- a/src/routes/htmx.py
+++ b/src/routes/htmx.py
@@ -10,6 +10,7 @@ from fastapi.responses import HTMLResponse

 from dependencies import get_db, get_templates
 from routes.api import verify_auth
+from dashboard_cache import get_cached, is_warm

 router = APIRouter()

@@ -58,6 +59,15 @@ async def htmx_top_ips(
    sort_by: str = Query("count"),
    sort_order: str = Query("desc"),
 ):
+    # Serve from cache on default first-page request
+    cached = (
+        get_cached("top_ips")
+        if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
+        else None
+    )
+    if cached:
+        result = cached
+    else:
        db = get_db()
        result = db.get_top_ips_paginated(
            page=max(1, page), page_size=8, sort_by=sort_by, sort_order=sort_order
@@ -87,6 +97,14 @@ async def htmx_top_paths(
    sort_by: str = Query("count"),
    sort_order: str = Query("desc"),
 ):
+    cached = (
+        get_cached("top_paths")
+        if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
+        else None
+    )
+    if cached:
+        result = cached
+    else:
        db = get_db()
        result = db.get_top_paths_paginated(
            page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
@@ -116,6 +134,14 @@ async def htmx_top_ua(
    sort_by: str = Query("count"),
    sort_order: str = Query("desc"),
 ):
+    cached = (
+        get_cached("top_ua")
+        if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
+        else None
+    )
+    if cached:
+        result = cached
+    else:
        db = get_db()
        result = db.get_top_user_agents_paginated(
            page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
--- a/src/tasks/dashboard_warmup.py
+++ b/src/tasks/dashboard_warmup.py
@@ -0,0 +1,68 @@
+# tasks/dashboard_warmup.py
+
+"""
+Pre-computes all Overview tab data and stores it in the in-memory cache.
+This keeps SQLite page buffers warm and lets the dashboard respond instantly.
+"""
+
+from logger import get_app_logger
+from database import get_database
+from dashboard_cache import set_cached
+
+app_logger = get_app_logger()
+
+# ----------------------
+# TASK CONFIG
+# ----------------------
+TASK_CONFIG = {
+    "name": "dashboard-warmup",
+    "cron": "*/1 * * * *",
+    "enabled": True,
+    "run_when_loaded": True,
+}
+
+
+# ----------------------
+# TASK LOGIC
+# ----------------------
+def main():
+    """
+    Refresh the in-memory dashboard cache with current Overview data.
+    TasksMaster will call this function based on the cron schedule.
+    """
+    task_name = TASK_CONFIG.get("name")
+    app_logger.info(f"[Background Task] {task_name} starting...")
+
+    try:
+        db = get_database()
+
+        # --- Server-rendered data (stats cards + suspicious table) ---
+        stats = db.get_dashboard_counts()
+
+        cred_result = db.get_credentials_paginated(page=1, page_size=1)
+        stats["credential_count"] = cred_result["pagination"]["total"]
+
+        suspicious = db.get_recent_suspicious(limit=10)
+
+        # --- HTMX Overview tables (first page, default sort) ---
+        top_ips = db.get_top_ips_paginated(page=1, page_size=8)
+        top_ua = db.get_top_user_agents_paginated(page=1, page_size=5)
+        top_paths = db.get_top_paths_paginated(page=1, page_size=5)
+
+        # --- Map data (default: top 100 IPs by total_requests) ---
+        map_ips = db.get_all_ips_paginated(
+            page=1, page_size=100, sort_by="total_requests", sort_order="desc"
+        )
+
+        # Store everything in the cache (overwrites previous values)
+        set_cached("stats", stats)
+        set_cached("suspicious", suspicious)
+        set_cached("top_ips", top_ips)
+        set_cached("top_ua", top_ua)
+        set_cached("top_paths", top_paths)
+        set_cached("map_ips", map_ips)
+
+        app_logger.info(f"[Background Task] {task_name} cache refreshed successfully.")
+
+    except Exception as e:
+        app_logger.error(f"[Background Task] {task_name} failed: {e}")