Merge pull request #127 from BlessedRebuS/feat/release-1.2

Feat/release 1.2
This commit is contained in:
Patrick Di Fazio
2026-03-10 11:31:18 +01:00
committed by GitHub
22 changed files with 355 additions and 45 deletions

View File

@@ -39,11 +39,11 @@
- [Demo](#demo)
- [What is Krawl?](#what-is-krawl)
- [Krawl Dashboard](#krawl-dashboard)
- [Installation](#-installation)
- [Quickstart](#quickstart)
- [Docker Run](#docker-run)
- [Docker Compose](#docker-compose)
- [Kubernetes](#kubernetes)
- [Local (Python)](#local-python)
- [Uvicorn (Python)](#uvicorn-python)
- [Configuration](#configuration)
- [config.yaml](#configuration-via-configyaml)
- [Environment Variables](#configuration-via-enviromental-variables)
@@ -51,7 +51,7 @@
- [IP Reputation](#ip-reputation)
- [Forward Server Header](#forward-server-header)
- [Additional Documentation](#additional-documentation)
- [Contributing](#-contributing)
- [Contributing](#contributing)
## Demo
Tip: crawl the `robots.txt` paths for additional fun
@@ -88,24 +88,29 @@ You can easily expose Krawl alongside your other services to shield them from we
Krawl provides a comprehensive dashboard, accessible at a **random secret path** generated at startup or at a **custom path** configured via `KRAWL_DASHBOARD_SECRET_PATH`. This keeps the dashboard hidden from attackers scanning your honeypot.
The dashboard is organized in three main tabs:
The dashboard is organized in five tabs:
- **Overview** — High-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths.
- **Overview**: high-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths.
![geoip](img/geoip_dashboard.png)
- **Attacks** — Detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables.
- **Attacks**: detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables.
![attack_types](img/attack_types.png)
- **IP Insight** — In-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history.
- **IP Insight**: in-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history.
![ipinsight](img/ip_insight_dashboard.png)
Additionally, after authenticating with the dashboard password, two protected tabs become available:
- **Tracked IPs**: maintain a watchlist of IP addresses you want to monitor over time.
- **IP Banlist**: manage IP bans, view detected attackers, and export the banlist in raw or IPTables format.
For more details, see the [Dashboard documentation](docs/dashboard.md).
## 🚀 Installation
## Quickstart
### Docker Run
@@ -139,6 +144,7 @@ services:
environment:
- CONFIG_LOCATION=config.yaml
- TZ=Europe/Rome
# - KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard"
# - KRAWL_DASHBOARD_PASSWORD=my-secret-password
volumes:
- ./config.yaml:/app/config.yaml:ro
@@ -166,7 +172,7 @@ docker-compose down
### Kubernetes
**Krawl is also available natively on Kubernetes**. Installation can be done either [via manifest](kubernetes/README.md) or [using the helm chart](helm/README.md).
### Python + Uvicorn
### Uvicorn (Python)
Run Krawl directly with Python (suggested version 13) and uvicorn for local development or testing:
@@ -307,7 +313,7 @@ location / {
| [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings |
| [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard |
## 🤝 Contributing
## Contributing
Contributions welcome! Please:
1. Fork the repository

View File

@@ -2,20 +2,182 @@
Access the dashboard at `http://<server-ip>:<port>/<dashboard-path>`
The dashboard shows:
- Total and unique accesses
- Suspicious activity and attack detection
- Top IPs, paths, user-agents and GeoIP localization
- Real-time monitoring
The Krawl dashboard is a single-page application with **5 tabs**: Overview, Attacks, IP Insight, Tracked IPs, and IP Banlist. The last two tabs are only visible after authenticating with the dashboard password.
The attackers' access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged.
---
Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website.
## Overview
![dashboard-1](../img/dashboard-1.png)
The default landing page provides a high-level summary of all traffic and suspicious activity detected by Krawl.
The top IP Addresses is shown along with top paths and User Agents
### Stats Cards
![dashboard-2](../img/dashboard-2.png)
Seven metric cards are displayed at the top:
![dashboard-3](../img/dashboard-3.png)
- **Total Accesses** — total number of requests received
- **Unique IPs** — distinct IP addresses observed
- **Unique Paths** — distinct request paths
- **Suspicious Accesses** — requests flagged as suspicious
- **Honeypot Caught** — requests that hit honeypot endpoints
- **Credentials Captured** — login attempts captured by the honeypot
- **Unique Attackers** — distinct IPs classified as attackers
### Search
A real-time search bar lets you search across attacks, IPs, patterns, and locations. Results are loaded dynamically as you type.
### IP Origins Map
An interactive world map (powered by Leaflet) displays the geolocation of top IP addresses. You can filter by category:
- Attackers
- Bad Crawlers
- Good Crawlers
- Regular Users
- Unknown
The number of displayed IPs is configurable (top 10, 100, 1,000, or all).
![Overview — Stats and Map](../img/geoip_dashboard.png)
### Recent Suspicious Activity
A table showing the last 10 suspicious requests with IP address, path, user-agent, and timestamp. Each entry provides actions to view the raw HTTP request or inspect the IP in detail.
### Top IP Addresses
A paginated, sortable table ranking IPs by access count. Each IP shows its category badge and can be clicked to expand inline details or open the IP Insight tab.
### Top Paths
A paginated table of the most accessed HTTP paths and their request counts.
### Top User-Agents
A paginated table of the most common user-agent strings with their frequency.
![Overview — Tables](../img/overview_tables_dashboard.png)
---
## Attacks
The Attacks tab focuses on detected malicious activity, attack patterns, and captured credentials.
### Attackers by Total Requests
A paginated table listing all detected attackers ranked by total requests. Columns include IP, total requests, first seen, last seen, and location. Sortable by multiple fields.
![Attacks — Attackers and Credentials](../img/top_attackers_dashboard.png)
### Captured Credentials
A table of usernames and passwords captured from honeypot login forms, with timestamps. Useful for analyzing common credential stuffing patterns.
### Honeypot Triggers by IP
Shows which IPs accessed honeypot endpoints and how many times, sorted by trigger count.
### Detected Attack Types
A detailed table of individual attack detections showing IP, path, attack type classifications, user-agent, and timestamp. Each entry can be expanded to view the raw HTTP request.
### Most Recurring Attack Types
A Chart.js visualization showing the frequency distribution of detected attack categories (e.g., SQL injection, path traversal, XSS).
### Most Recurring Attack Patterns
A paginated table of specific attack patterns and their occurrence counts across all traffic.
![Attacks — Attack Types and Patterns](../img/attack_types_dashboard.png)
---
## IP Insight
The IP Insight tab provides a deep-dive view for a single IP address. It is activated by clicking "Inspect IP" from any table in the dashboard.
### IP Information Card
Displays comprehensive details about the selected IP:
- **Activity** — total requests, first seen, last seen, last analysis timestamp
- **Geo & Network** — location, region, timezone, ISP, ASN, reverse DNS
- **Category** — classification badge (Attacker, Good Crawler, Bad Crawler, Regular User, Unknown)
### Ban & Track Actions
When authenticated, admin actions are available:
- **Ban/Unban** — immediately add or remove the IP from the banlist
- **Track/Untrack** — add the IP to your watchlist for ongoing monitoring
### Blocklist Memberships
Shows which threat intelligence blocklists the IP appears on, providing external reputation context.
### Access Logs
A filtered view of all requests made by this specific IP, with full request details.
![IP Insight — Detail View](../img/ip_insight_dashboard.png)
---
## Tracked IPs
> Requires authentication with the dashboard password.
The Tracked IPs tab lets you maintain a watchlist of IP addresses you want to monitor over time.
### Track New IP
A form to add any IP address to your tracking list for ongoing observation.
### Currently Tracked IPs
A paginated table of all manually tracked IPs, with the option to untrack each one.
![Tracked IPs](../img/tracked_ips_dashboard.png)
---
## IP Banlist
> Requires authentication with the dashboard password.
The IP Banlist tab provides tools for managing IP bans. Bans are exported every 5 minutes.
### Force Ban IP
A form to immediately ban any IP address by entering it manually.
### Detected Attackers
A paginated list of all IPs detected as attackers, with quick-ban actions for each entry.
![IP Banlist — Detected](../img/banlist_attackers_dashboard.png)
### Active Ban Overrides
A table of currently active manual ban overrides, with options to unban or reset the override status for each IP.
![IP Banlist — Overrides](../img/banlist_overrides_dashboard.png)
### Export Banlist
A dropdown menu to download the current banlist in two formats:
- **Raw IPs List** — plain text, one IP per line
- **IPTables Rules** — ready-to-use firewall rules
---
## Authentication
The dashboard uses session-based authentication with secure HTTP-only cookies. Protected features (Tracked IPs, IP Banlist, ban/track actions) require entering the dashboard password. The login includes brute-force protection with IP-based rate limiting and exponential backoff.
Click the lock icon in the top-right corner of the navigation bar to authenticate or log out.
![Authentication Modal](../img/auth_prompt.png)

View File

@@ -2,8 +2,8 @@ apiVersion: v2
name: krawl-chart
description: A Helm chart for Krawl honeypot server
type: application
version: 1.1.7
appVersion: 1.1.7
version: 1.2.0
appVersion: 1.2.0
keywords:
- honeypot
- security

View File

@@ -14,7 +14,7 @@ A Helm chart for deploying the Krawl honeypot application on Kubernetes.
```bash
helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \
--version 1.1.3 \
--version 1.2.0 \
--namespace krawl-system \
--create-namespace \
-f values.yaml # optional
@@ -170,7 +170,7 @@ kubectl get secret krawl-server -n krawl-system \
You can override individual values with `--set` without a values file:
```bash
helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 \
helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.2.0 \
--set ingress.hosts[0].host=honeypot.example.com \
--set config.canary.token_url=https://canarytokens.com/your-token
```
@@ -178,7 +178,7 @@ helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 \
## Upgrading
```bash
helm upgrade krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.3 -f values.yaml
helm upgrade krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.2.0 -f values.yaml
```
## Uninstalling

View File

@@ -3,7 +3,7 @@ replicaCount: 1
image:
repository: ghcr.io/blessedrebus/krawl
pullPolicy: Always
tag: "1.1.3"
tag: "1.2.0"
imagePullSecrets: []
nameOverride: "krawl"

Binary file not shown.

After

Width:  |  Height:  |  Size: 283 KiB

BIN
img/auth_prompt.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 173 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 133 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 206 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 343 KiB

After

Width:  |  Height:  |  Size: 808 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 290 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 312 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 131 KiB

32
src/dashboard_cache.py Normal file
View File

@@ -0,0 +1,32 @@
"""
In-memory cache for dashboard Overview data.
A background task periodically refreshes this cache so the dashboard
serves pre-computed data instantly instead of hitting SQLite cold.
Memory footprint is fixed — each key is overwritten on every refresh.
"""
import threading
from typing import Any, Dict, Optional
_lock = threading.Lock()
_cache: Dict[str, Any] = {}
def get_cached(key: str) -> Optional[Any]:
"""Get a value from the dashboard cache."""
with _lock:
return _cache.get(key)
def set_cached(key: str, value: Any) -> None:
"""Set a value in the dashboard cache."""
with _lock:
_cache[key] = value
def is_warm() -> bool:
"""Check if the cache has been populated at least once."""
with _lock:
return "stats" in _cache

View File

@@ -18,6 +18,7 @@ from pydantic import BaseModel
from dependencies import get_db, get_client_ip
from logger import get_app_logger
from dashboard_cache import get_cached, is_warm
# Server-side session token store (valid tokens for authenticated sessions)
_auth_tokens: set = set()
@@ -249,10 +250,22 @@ async def all_ips(
sort_by: str = Query("total_requests"),
sort_order: str = Query("desc"),
):
db = get_db()
page = max(1, page)
page_size = min(max(1, page_size), 10000)
# Serve from cache on default map request (top 100 IPs)
if (
page == 1
and page_size == 100
and sort_by == "total_requests"
and sort_order == "desc"
and is_warm()
):
cached = get_cached("map_ips")
if cached:
return JSONResponse(content=cached, headers=_no_cache_headers())
db = get_db()
try:
result = db.get_all_ips_paginated(
page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order

View File

@@ -10,6 +10,7 @@ from fastapi.responses import JSONResponse
from logger import get_app_logger
from dependencies import get_db, get_templates
from dashboard_cache import get_cached, is_warm
router = APIRouter()
@@ -17,15 +18,17 @@ router = APIRouter()
@router.get("")
@router.get("/")
async def dashboard_page(request: Request):
db = get_db()
config = request.app.state.config
dashboard_path = "/" + config.dashboard_secret_path.lstrip("/")
# Get initial data for server-rendered sections
# Serve from pre-computed cache when available, fall back to live queries
if is_warm():
stats = get_cached("stats")
suspicious = get_cached("suspicious")
else:
db = get_db()
stats = db.get_dashboard_counts()
suspicious = db.get_recent_suspicious(limit=10)
# Get credential count for the stats card
cred_result = db.get_credentials_paginated(page=1, page_size=1)
stats["credential_count"] = cred_result["pagination"]["total"]

View File

@@ -10,6 +10,7 @@ from fastapi.responses import HTMLResponse
from dependencies import get_db, get_templates
from routes.api import verify_auth
from dashboard_cache import get_cached, is_warm
router = APIRouter()
@@ -58,6 +59,15 @@ async def htmx_top_ips(
sort_by: str = Query("count"),
sort_order: str = Query("desc"),
):
# Serve from cache on default first-page request
cached = (
get_cached("top_ips")
if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
else None
)
if cached:
result = cached
else:
db = get_db()
result = db.get_top_ips_paginated(
page=max(1, page), page_size=8, sort_by=sort_by, sort_order=sort_order
@@ -87,6 +97,14 @@ async def htmx_top_paths(
sort_by: str = Query("count"),
sort_order: str = Query("desc"),
):
cached = (
get_cached("top_paths")
if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
else None
)
if cached:
result = cached
else:
db = get_db()
result = db.get_top_paths_paginated(
page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
@@ -116,6 +134,14 @@ async def htmx_top_ua(
sort_by: str = Query("count"),
sort_order: str = Query("desc"),
):
cached = (
get_cached("top_ua")
if (page == 1 and sort_by == "count" and sort_order == "desc" and is_warm())
else None
)
if cached:
result = cached
else:
db = get_db()
result = db.get_top_user_agents_paginated(
page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order

View File

@@ -0,0 +1,68 @@
# tasks/dashboard_warmup.py
"""
Pre-computes all Overview tab data and stores it in the in-memory cache.
This keeps SQLite page buffers warm and lets the dashboard respond instantly.
"""
from logger import get_app_logger
from database import get_database
from dashboard_cache import set_cached
app_logger = get_app_logger()
# ----------------------
# TASK CONFIG
# ----------------------
TASK_CONFIG = {
"name": "dashboard-warmup",
"cron": "*/1 * * * *",
"enabled": True,
"run_when_loaded": True,
}
# ----------------------
# TASK LOGIC
# ----------------------
def main():
"""
Refresh the in-memory dashboard cache with current Overview data.
TasksMaster will call this function based on the cron schedule.
"""
task_name = TASK_CONFIG.get("name")
app_logger.info(f"[Background Task] {task_name} starting...")
try:
db = get_database()
# --- Server-rendered data (stats cards + suspicious table) ---
stats = db.get_dashboard_counts()
cred_result = db.get_credentials_paginated(page=1, page_size=1)
stats["credential_count"] = cred_result["pagination"]["total"]
suspicious = db.get_recent_suspicious(limit=10)
# --- HTMX Overview tables (first page, default sort) ---
top_ips = db.get_top_ips_paginated(page=1, page_size=8)
top_ua = db.get_top_user_agents_paginated(page=1, page_size=5)
top_paths = db.get_top_paths_paginated(page=1, page_size=5)
# --- Map data (default: top 100 IPs by total_requests) ---
map_ips = db.get_all_ips_paginated(
page=1, page_size=100, sort_by="total_requests", sort_order="desc"
)
# Store everything in the cache (overwrites previous values)
set_cached("stats", stats)
set_cached("suspicious", suspicious)
set_cached("top_ips", top_ips)
set_cached("top_ua", top_ua)
set_cached("top_paths", top_paths)
set_cached("map_ips", map_ips)
app_logger.info(f"[Background Task] {task_name} cache refreshed successfully.")
except Exception as e:
app_logger.error(f"[Background Task] {task_name} failed: {e}")