Merge pull request #127 from BlessedRebuS/feat/release-1.2

Feat/release 1.2
This commit is contained in:
Patrick Di Fazio
2026-03-10 11:31:18 +01:00
committed by GitHub
22 changed files with 355 additions and 45 deletions

View File

@@ -39,11 +39,11 @@
- [Demo](#demo) - [Demo](#demo)
- [What is Krawl?](#what-is-krawl) - [What is Krawl?](#what-is-krawl)
- [Krawl Dashboard](#krawl-dashboard) - [Krawl Dashboard](#krawl-dashboard)
- [Installation](#-installation) - [Quickstart](#quickstart)
- [Docker Run](#docker-run) - [Docker Run](#docker-run)
- [Docker Compose](#docker-compose) - [Docker Compose](#docker-compose)
- [Kubernetes](#kubernetes) - [Kubernetes](#kubernetes)
- [Local (Python)](#local-python) - [Uvicorn (Python)](#uvicorn-python)
- [Configuration](#configuration) - [Configuration](#configuration)
- [config.yaml](#configuration-via-configyaml) - [config.yaml](#configuration-via-configyaml)
- [Environment Variables](#configuration-via-enviromental-variables) - [Environment Variables](#configuration-via-enviromental-variables)
@@ -51,7 +51,7 @@
- [IP Reputation](#ip-reputation) - [IP Reputation](#ip-reputation)
- [Forward Server Header](#forward-server-header) - [Forward Server Header](#forward-server-header)
- [Additional Documentation](#additional-documentation) - [Additional Documentation](#additional-documentation)
- [Contributing](#-contributing) - [Contributing](#contributing)
## Demo ## Demo
Tip: crawl the `robots.txt` paths for additional fun Tip: crawl the `robots.txt` paths for additional fun
@@ -88,24 +88,29 @@ You can easily expose Krawl alongside your other services to shield them from we
Krawl provides a comprehensive dashboard, accessible at a **random secret path** generated at startup or at a **custom path** configured via `KRAWL_DASHBOARD_SECRET_PATH`. This keeps the dashboard hidden from attackers scanning your honeypot. Krawl provides a comprehensive dashboard, accessible at a **random secret path** generated at startup or at a **custom path** configured via `KRAWL_DASHBOARD_SECRET_PATH`. This keeps the dashboard hidden from attackers scanning your honeypot.
The dashboard is organized in three main tabs: The dashboard is organized in five tabs:
- **Overview** — High-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths. - **Overview**: high-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths.
![geoip](img/geoip_dashboard.png) ![geoip](img/geoip_dashboard.png)
- **Attacks** — Detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables. - **Attacks**: detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables.
![attack_types](img/attack_types.png) ![attack_types](img/attack_types.png)
- **IP Insight** — In-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history. - **IP Insight**: in-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history.
![ipinsight](img/ip_insight_dashboard.png) ![ipinsight](img/ip_insight_dashboard.png)
Additionally, after authenticating with the dashboard password, two protected tabs become available:
- **Tracked IPs**: maintain a watchlist of IP addresses you want to monitor over time.
- **IP Banlist**: manage IP bans, view detected attackers, and export the banlist in raw or IPTables format.
For more details, see the [Dashboard documentation](docs/dashboard.md). For more details, see the [Dashboard documentation](docs/dashboard.md).
## 🚀 Installation ## Quickstart
### Docker Run ### Docker Run
@@ -139,6 +144,7 @@ services:
environment: environment:
- CONFIG_LOCATION=config.yaml - CONFIG_LOCATION=config.yaml
- TZ=Europe/Rome - TZ=Europe/Rome
# - KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard"
# - KRAWL_DASHBOARD_PASSWORD=my-secret-password # - KRAWL_DASHBOARD_PASSWORD=my-secret-password
volumes: volumes:
- ./config.yaml:/app/config.yaml:ro - ./config.yaml:/app/config.yaml:ro
@@ -166,7 +172,7 @@ docker-compose down
### Kubernetes ### Kubernetes
**Krawl is also available natively on Kubernetes**. Installation can be done either [via manifest](kubernetes/README.md) or [using the helm chart](helm/README.md). **Krawl is also available natively on Kubernetes**. Installation can be done either [via manifest](kubernetes/README.md) or [using the helm chart](helm/README.md).
### Python + Uvicorn ### Uvicorn (Python)
Run Krawl directly with Python (suggested version 13) and uvicorn for local development or testing: Run Krawl directly with Python (suggested version 13) and uvicorn for local development or testing:
@@ -307,7 +313,7 @@ location / {
| [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings | | [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings |
| [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard | | [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard |
## 🤝 Contributing ## Contributing
Contributions welcome! Please: Contributions welcome! Please:
1. Fork the repository 1. Fork the repository

View File

@@ -2,20 +2,182 @@
Access the dashboard at `http://<server-ip>:<port>/<dashboard-path>` Access the dashboard at `http://<server-ip>:<port>/<dashboard-path>`
The dashboard shows: The Krawl dashboard is a single-page application with **5 tabs**: Overview, Attacks, IP Insight, Tracked IPs, and IP Banlist. The last two tabs are only visible after authenticating with the dashboard password.
- Total and unique accesses
- Suspicious activity and attack detection
- Top IPs, paths, user-agents and GeoIP localization
- Real-time monitoring
The attackers' access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged. ---
Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website. ## Overview
![dashboard-1](../img/dashboard-1.png) The default landing page provides a high-level summary of all traffic and suspicious activity detected by Krawl.
The top IP Addresses is shown along with top paths and User Agents ### Stats Cards
![dashboard-2](../img/dashboard-2.png) Seven metric cards are displayed at the top:
![dashboard-3](../img/dashboard-3.png) - **Total Accesses** — total number of requests received
- **Unique IPs** — distinct IP addresses observed
- **Unique Paths** — distinct request paths
- **Suspicious Accesses** — requests flagged as suspicious
- **Honeypot Caught** — requests that hit honeypot endpoints
- **Credentials Captured** — login attempts captured by the honeypot
- **Unique Attackers** — distinct IPs classified as attackers
### Search
A real-time search bar lets you search across attacks, IPs, patterns, and locations. Results are loaded dynamically as you type.
### IP Origins Map
An interactive world map (powered by Leaflet) displays the geolocation of top IP addresses. You can filter by category:
- Attackers
- Bad Crawlers
- Good Crawlers
- Regular Users
- Unknown
The number of displayed IPs is configurable (top 10, 100, 1,000, or all).
![Overview — Stats and Map](../img/geoip_dashboard.png)
### Recent Suspicious Activity
A table showing the last 10 suspicious requests with IP address, path, user-agent, and timestamp. Each entry provides actions to view the raw HTTP request or inspect the IP in detail.
### Top IP Addresses
A paginated, sortable table ranking IPs by access count. Each IP shows its category badge and can be clicked to expand inline details or open the IP Insight tab.
### Top Paths
A paginated table of the most accessed HTTP paths and their request counts.
### Top User-Agents
A paginated table of the most common user-agent strings with their frequency.
![Overview — Tables](../img/overview_tables_dashboard.png)
---
## Attacks
The Attacks tab focuses on detected malicious activity, attack patterns, and captured credentials.
### Attackers by Total Requests
A paginated table listing all detected attackers ranked by total requests. Columns include IP, total requests, first seen, last seen, and location. Sortable by multiple fields.
![Attacks — Attackers and Credentials](../img/top_attackers_dashboard.png)
### Captured Credentials
A table of usernames and passwords captured from honeypot login forms, with timestamps. Useful for analyzing common credential stuffing patterns.
### Honeypot Triggers by IP
Shows which IPs accessed honeypot endpoints and how many times, sorted by trigger count.
### Detected Attack Types
A detailed table of individual attack detections showing IP, path, attack type classifications, user-agent, and timestamp. Each entry can be expanded to view the raw HTTP request.
### Most Recurring Attack Types
A Chart.js visualization showing the frequency distribution of detected attack categories (e.g., SQL injection, path traversal, XSS).
### Most Recurring Attack Patterns
A paginated table of specific attack patterns and their occurrence counts across all traffic.
![Attacks — Attack Types and Patterns](../img/attack_types_dashboard.png)
---
## IP Insight
The IP Insight tab provides a deep-dive view for a single IP address. It is activated by clicking "Inspect IP" from any table in the dashboard.
### IP Information Card
Displays comprehensive details about the selected IP:
- **Activity** — total requests, first seen, last seen, last analysis timestamp
- **Geo & Network** — location, region, timezone, ISP, ASN, reverse DNS
- **Category** — classification badge (Attacker, Good Crawler, Bad Crawler, Regular User, Unknown)
### Ban & Track Actions
When authenticated, admin actions are available:
- **Ban/Unban** — immediately add or remove the IP from the banlist
- **Track/Untrack** — add the IP to your watchlist for ongoing monitoring
### Blocklist Memberships
Shows which threat intelligence blocklists the IP appears on, providing external reputation context.
### Access Logs
A filtered view of all requests made by this specific IP, with full request details.
![IP Insight — Detail View](../img/ip_insight_dashboard.png)
---
## Tracked IPs
> Requires authentication with the dashboard password.
The Tracked IPs tab lets you maintain a watchlist of IP addresses you want to monitor over time.
### Track New IP
A form to add any IP address to your tracking list for ongoing observation.
### Currently Tracked IPs
A paginated table of all manually tracked IPs, with the option to untrack each one.
![Tracked IPs](../img/tracked_ips_dashboard.png)
---
## IP Banlist
> Requires authentication with the dashboard password.
The IP Banlist tab provides tools for managing IP bans. Bans are exported every 5 minutes.
### Force Ban IP
A form to immediately ban any IP address by entering it manually.
### Detected Attackers
A paginated list of all IPs detected as attackers, with quick-ban actions for each entry.
![IP Banlist — Detected](../img/banlist_attackers_dashboard.png)
### Active Ban Overrides
A table of currently active manual ban overrides, with options to unban or reset the override status for each IP.
![IP Banlist — Overrides](../img/banlist_overrides_dashboard.png)
### Export Banlist
A dropdown menu to download the current banlist in two formats:
- **Raw IPs List** — plain text, one IP per line
- **IPTables Rules** — ready-to-use firewall rules
---
## Authentication
The dashboard uses session-based authentication with secure HTTP-only cookies. Protected features (Tracked IPs, IP Banlist, ban/track actions) require entering the dashboard password. The login includes brute-force protection with IP-based rate limiting and exponential backoff.
Click the lock icon in the top-right corner of the navigation bar to authenticate or log out.
![Authentication Modal](../img/auth_prompt.png)

View File

@@ -2,8 +2,8 @@ apiVersion: v2
name: krawl-chart name: krawl-chart
description: A Helm chart for Krawl honeypot server description: A Helm chart for Krawl honeypot server
type: application type: application
version: 1.1.7 version: 1.2.0
appVersion: 1.1.7 appVersion: 1.2.0
keywords: keywords:
- honeypot - honeypot
- security - security

View File

@@ -14,7 +14,7 @@ A Helm chart for deploying the Krawl honeypot application on Kubernetes.
```bash ```bash
helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \ helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \
--version 1.1.3 \ --version 1.2.0 \
--namespace krawl-system \ --namespace krawl-system \
--create-namespace \ --create-namespace \
-f values.yaml # optional -f values.yaml # optional
@@ -170,7 +170,7 @@ kubectl get secret krawl-server -n krawl-system \
You can override individual values with `--set` without a values file: You can override individual values with `--set` without a values file:
```bash ```bash
helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 \ helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.2.0 \
--set ingress.hosts[0].host=honeypot.example.com \ --set ingress.hosts[0].host=honeypot.example.com \
--set config.canary.token_url=https://canarytokens.com/your-token --set config.canary.token_url=https://canarytokens.com/your-token
``` ```
@@ -178,7 +178,7 @@ helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 \
## Upgrading ## Upgrading
```bash ```bash
helm upgrade krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 -f values.yaml helm upgrade krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.2.0 -f values.yaml
``` ```
## Uninstalling ## Uninstalling

View File

@@ -3,7 +3,7 @@ replicaCount: 1
image: image:
repository: ghcr.io/blessedrebus/krawl repository: ghcr.io/blessedrebus/krawl
pullPolicy: Always pullPolicy: Always
tag: "1.1.3" tag: "1.2.0"
imagePullSecrets: [] imagePullSecrets: []
nameOverride: "krawl" nameOverride: "krawl"

Binary file not shown.

After

Width:  |  Height:  |  Size: 283 KiB

BIN
img/auth_prompt.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 173 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 133 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 206 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 343 KiB

After

Width:  |  Height:  |  Size: 808 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 290 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 312 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 131 KiB

32
src/dashboard_cache.py Normal file
View File

@@ -0,0 +1,32 @@
"""
In-memory cache for dashboard Overview data.
A background task periodically refreshes this cache so the dashboard
serves pre-computed data instantly instead of hitting SQLite cold.
Memory footprint is fixed — each key is overwritten on every refresh.
"""
import threading
from typing import Any, Dict, Optional
_lock = threading.Lock()
_cache: Dict[str, Any] = {}
def get_cached(key: str) -> Optional[Any]:
"""Get a value from the dashboard cache."""
with _lock:
return _cache.get(key)
def set_cached(key: str, value: Any) -> None:
"""Set a value in the dashboard cache."""
with _lock:
_cache[key] = value
def is_warm() -> bool:
"""Check if the cache has been populated at least once."""
with _lock:
return "stats" in _cache

View File

@@ -18,6 +18,7 @@ from pydantic import BaseModel
from dependencies import get_db, get_client_ip from dependencies import get_db, get_client_ip
from logger import get_app_logger from logger import get_app_logger
from dashboard_cache import get_cached, is_warm
# Server-side session token store (valid tokens for authenticated sessions) # Server-side session token store (valid tokens for authenticated sessions)
_auth_tokens: set = set() _auth_tokens: set = set()
@@ -249,10 +250,22 @@ async def all_ips(
sort_by: str = Query("total_requests"), sort_by: str = Query("total_requests"),
sort_order: str = Query("desc"), sort_order: str = Query("desc"),
): ):
db = get_db()
page = max(1, page) page = max(1, page)
page_size = min(max(1, page_size), 10000) page_size = min(max(1, page_size), 10000)
# Serve from cache on default map request (top 100 IPs)
if (
page == 1
and page_size == 100
and sort_by == "total_requests"
and sort_order == "desc"
and is_warm()
):
cached = get_cached("map_ips")
if cached:
return JSONResponse(content=cached, headers=_no_cache_headers())
db = get_db()
try: try:
result = db.get_all_ips_paginated( result = db.get_all_ips_paginated(
page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order

View File

@@ -10,6 +10,7 @@ from fastapi.responses import JSONResponse
from logger import get_app_logger from logger import get_app_logger
from dependencies import get_db, get_templates from dependencies import get_db, get_templates
from dashboard_cache import get_cached, is_warm
router = APIRouter() router = APIRouter()
@@ -17,17 +18,19 @@ router = APIRouter()
@router.get("") @router.get("")
@router.get("/") @router.get("/")
async def dashboard_page(request: Request): async def dashboard_page(request: Request):
db = get_db()
config = request.app.state.config config = request.app.state.config
dashboard_path = "/" + config.dashboard_secret_path.lstrip("/") dashboard_path = "/" + config.dashboard_secret_path.lstrip("/")
# Get initial data for server-rendered sections # Serve from pre-computed cache when available, fall back to live queries
stats = db.get_dashboard_counts() if is_warm():
suspicious = db.get_recent_suspicious(limit=10) stats = get_cached("stats")
suspicious = get_cached("suspicious")
# Get credential count for the stats card else:
cred_result = db.get_credentials_paginated(page=1, page_size=1) db = get_db()
stats["credential_count"] = cred_result["pagination"]["total"] stats = db.get_dashboard_counts()
suspicious = db.get_recent_suspicious(limit=10)
cred_result = db.get_credentials_paginated(page=1, page_size=1)
stats["credential_count"] = cred_result["pagination"]["total"]
templates = get_templates() templates = get_templates()
return templates.TemplateResponse( return templates.TemplateResponse(

View File

@@ -10,6 +10,7 @@ from fastapi.responses import HTMLResponse
from dependencies import get_db, get_templates from dependencies import get_db, get_templates
from routes.api import verify_auth from routes.api import verify_auth
from dashboard_cache import get_cached, is_warm
router = APIRouter() router = APIRouter()
@@ -58,10 +59,19 @@ async def htmx_top_ips(
sort_by: str = Query("count"), sort_by: str = Query("count"),
sort_order: str = Query("desc"), sort_order: str = Query("desc"),
): ):
db = get_db() # Serve from cache on default first-page request
result = db.get_top_ips_paginated( cached = (
page=max(1, page), page_size=8, sort_by=sort_by, sort_order=sort_order get_cached("top_ips")
if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
else None
) )
if cached:
result = cached
else:
db = get_db()
result = db.get_top_ips_paginated(
page=max(1, page), page_size=8, sort_by=sort_by, sort_order=sort_order
)
templates = get_templates() templates = get_templates()
return templates.TemplateResponse( return templates.TemplateResponse(
@@ -87,10 +97,18 @@ async def htmx_top_paths(
sort_by: str = Query("count"), sort_by: str = Query("count"),
sort_order: str = Query("desc"), sort_order: str = Query("desc"),
): ):
db = get_db() cached = (
result = db.get_top_paths_paginated( get_cached("top_paths")
page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
else None
) )
if cached:
result = cached
else:
db = get_db()
result = db.get_top_paths_paginated(
page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
)
templates = get_templates() templates = get_templates()
return templates.TemplateResponse( return templates.TemplateResponse(
@@ -116,10 +134,18 @@ async def htmx_top_ua(
sort_by: str = Query("count"), sort_by: str = Query("count"),
sort_order: str = Query("desc"), sort_order: str = Query("desc"),
): ):
db = get_db() cached = (
result = db.get_top_user_agents_paginated( get_cached("top_ua")
page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
else None
) )
if cached:
result = cached
else:
db = get_db()
result = db.get_top_user_agents_paginated(
page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
)
templates = get_templates() templates = get_templates()
return templates.TemplateResponse( return templates.TemplateResponse(

View File

@@ -0,0 +1,68 @@
# tasks/dashboard_warmup.py
"""
Pre-computes all Overview tab data and stores it in the in-memory cache.
This keeps SQLite page buffers warm and lets the dashboard respond instantly.
"""
from logger import get_app_logger
from database import get_database
from dashboard_cache import set_cached
app_logger = get_app_logger()
# ----------------------
# TASK CONFIG
# ----------------------
TASK_CONFIG = {
"name": "dashboard-warmup",
"cron": "*/1 * * * *",
"enabled": True,
"run_when_loaded": True,
}
# ----------------------
# TASK LOGIC
# ----------------------
def main():
"""
Refresh the in-memory dashboard cache with current Overview data.
TasksMaster will call this function based on the cron schedule.
"""
task_name = TASK_CONFIG.get("name")
app_logger.info(f"[Background Task] {task_name} starting...")
try:
db = get_database()
# --- Server-rendered data (stats cards + suspicious table) ---
stats = db.get_dashboard_counts()
cred_result = db.get_credentials_paginated(page=1, page_size=1)
stats["credential_count"] = cred_result["pagination"]["total"]
suspicious = db.get_recent_suspicious(limit=10)
# --- HTMX Overview tables (first page, default sort) ---
top_ips = db.get_top_ips_paginated(page=1, page_size=8)
top_ua = db.get_top_user_agents_paginated(page=1, page_size=5)
top_paths = db.get_top_paths_paginated(page=1, page_size=5)
# --- Map data (default: top 100 IPs by total_requests) ---
map_ips = db.get_all_ips_paginated(
page=1, page_size=100, sort_by="total_requests", sort_order="desc"
)
# Store everything in the cache (overwrites previous values)
set_cached("stats", stats)
set_cached("suspicious", suspicious)
set_cached("top_ips", top_ips)
set_cached("top_ua", top_ua)
set_cached("top_paths", top_paths)
set_cached("map_ips", map_ips)
app_logger.info(f"[Background Task] {task_name} cache refreshed successfully.")
except Exception as e:
app_logger.error(f"[Background Task] {task_name} failed: {e}")