docs/architecture.md

# Krawl Architecture

## Overview

Krawl is a cloud-native deception honeypot server built on **FastAPI**. It creates realistic fake web applications (admin panels, login pages, fake credentials) to attract, detect, and analyze malicious crawlers and attackers while wasting their resources with infinite spider-trap pages.

## Tech Stack

| Layer | Technology |
|-------|-----------|
| **Backend** | FastAPI, Uvicorn, Python 3.11 |
| **ORM / DB** | SQLAlchemy 2.0, SQLite (WAL mode) |
| **Templating** | Jinja2 (server-side rendering) |
| **Reactivity** | Alpine.js 3.14 |
| **Partial Updates** | HTMX 2.0 |
| **Charts** | Chart.js 3.9 (doughnut), custom SVG radar |
| **Maps** | Leaflet 1.9 + CartoDB dark tiles |
| **Scheduling** | APScheduler |
| **Container** | Docker (python:3.11-slim), Helm/K8s ready |

## Directory Structure

```
Krawl/
├── src/
│   ├── app.py                    # FastAPI app factory + lifespan
│   ├── config.py                 # YAML + env config loader
│   ├── dependencies.py           # DI providers (templates, DB, client IP)
│   ├── database.py               # DatabaseManager singleton
│   ├── models.py                 # SQLAlchemy ORM models
│   ├── tracker.py                # In-memory + DB access tracking
│   ├── logger.py                 # Rotating file log handlers
│   ├── deception_responses.py    # Attack detection + fake responses
│   ├── sanitizer.py              # Input sanitization
│   ├── generators.py             # Random content generators
│   ├── wordlists.py              # JSON wordlist loader
│   ├── geo_utils.py              # IP geolocation API
│   ├── ip_utils.py               # IP validation
│   │
│   ├── routes/
│   │   ├── honeypot.py           # Trap pages, credential capture, catch-all
│   │   ├── dashboard.py          # Dashboard page (Jinja2 SSR)
│   │   ├── api.py                # JSON API endpoints
│   │   └── htmx.py               # HTMX HTML fragment endpoints
│   │
│   ├── middleware/
│   │   ├── deception.py          # Path traversal / XXE / cmd injection detection
│   │   └── ban_check.py          # Banned IP enforcement
│   │
│   ├── tasks/                    # APScheduler background jobs
│   │   ├── analyze_ips.py        # IP categorization scoring
│   │   ├── fetch_ip_rep.py       # Geolocation + blocklist enrichment
│   │   ├── db_dump.py            # Database export
│   │   ├── memory_cleanup.py     # In-memory list trimming
│   │   └── top_attacking_ips.py  # Top attacker caching
│   │
│   ├── tasks_master.py           # Task discovery + APScheduler orchestrator
│   ├── firewall/                 # Banlist export (iptables, raw)
│   ├── migrations/               # Schema migrations (auto-run)
│   │
│   └── templates/
│       ├── jinja2/
│       │   ├── base.html                     # Layout + CDN scripts
│       │   └── dashboard/
│       │       ├── index.html                # Main dashboard page
│       │       └── partials/                 # 13 HTMX fragment templates
│       ├── html/                             # Deceptive trap page templates
│       └── static/
│           ├── css/dashboard.css
│           └── js/
│               ├── dashboard.js              # Alpine.js app controller
│               ├── map.js                    # Leaflet map
│               ├── charts.js                 # Chart.js doughnut
│               └── radar.js                  # SVG radar chart
│
├── config.yaml               # Application configuration
├── wordlists.json             # Attack patterns + fake credentials
├── Dockerfile                 # Container build
├── docker-compose.yaml        # Local orchestration
├── entrypoint.sh              # Container startup (gosu privilege drop)
├── kubernetes/                # K8s manifests
└── helm/                      # Helm chart
```

## Application Entry Point

`src/app.py` uses the **FastAPI application factory** pattern with an async lifespan manager:

```
Startup                              Shutdown
  │                                    │
  ├─ Initialize logging                └─ Log shutdown
  ├─ Initialize SQLite DB
  ├─ Create AccessTracker
  ├─ Load webpages file (optional)
  ├─ Store config + tracker in app.state
  ├─ Start APScheduler background tasks
  └─ Log dashboard URL
```

## Request Pipeline

```
        Request
          │
          ▼
┌──────────────────────┐
│  BanCheckMiddleware  │──→ IP banned? → Return 500
└──────────┬───────────┘
           ▼
┌──────────────────────┐
│ DeceptionMiddleware  │──→ Attack detected? → Fake error response
└──────────┬───────────┘
           ▼
┌───────────────────────┐
│ ServerHeaderMiddleware│──→ Add random Server header
└──────────┬────────────┘
           ▼
┌───────────────────────┐
│     Route Matching    │
│  (ordered by priority)│
│                       │
│  1. Static files      │  /{secret}/static/*
│  2. Dashboard router  │  /{secret}/          (prefix-based)
│  3. API router        │  /{secret}/api/*     (prefix-based)
│  4. HTMX router       │  /{secret}/htmx/*   (prefix-based)
│  5. Honeypot router   │  /* (catch-all)
└───────────────────────┘
```

### Prefix-Based Routing

Dashboard, API, and HTMX routers are mounted with `prefix=f"/{secret}"` in `app.py`. This means:
- Route handlers define paths **without** the secret (e.g., `@router.get("/api/all-ips")`)
- FastAPI prepends the secret automatically (e.g., `GET /a1b2c3/api/all-ips`)
- The honeypot catch-all `/{path:path}` only matches paths that **don't** start with the secret
- No `_is_dashboard_path()` checks needed — the prefix handles access scoping

## Route Architecture

### Honeypot Routes (`routes/honeypot.py`)

| Method | Path | Response |
|--------|------|----------|
| `GET` | `/{path:path}` | Trap page with random links (catch-all) |
| `HEAD` | `/{path:path}` | 200 OK |
| `POST` | `/{path:path}` | Credential capture |
| `GET` | `/admin`, `/login` | Fake login form |
| `GET` | `/wp-admin`, `/wp-login.php` | Fake WordPress login |
| `GET` | `/phpmyadmin` | Fake phpMyAdmin |
| `GET` | `/robots.txt` | Honeypot paths advertised |
| `GET/POST` | `/api/search`, `/api/sql` | SQL injection honeypot |
| `POST` | `/api/contact` | XSS detection endpoint |
| `GET` | `/.env`, `/credentials.txt` | Fake sensitive files |

### Dashboard Routes (`routes/dashboard.py`)

| Method | Path | Response |
|--------|------|----------|
| `GET` | `/` | Server-rendered dashboard (Jinja2) |

### API Routes (`routes/api.py`)

| Method | Path | Response |
|--------|------|----------|
| `GET` | `/api/all-ips` | Paginated IP list with stats |
| `GET` | `/api/attackers` | Paginated attacker IPs |
| `GET` | `/api/ip-stats/{ip}` | Single IP detail |
| `GET` | `/api/credentials` | Captured credentials |
| `GET` | `/api/honeypot` | Honeypot trigger counts |
| `GET` | `/api/top-ips` | Top requesting IPs |
| `GET` | `/api/top-paths` | Most requested paths |
| `GET` | `/api/top-user-agents` | Top user agents |
| `GET` | `/api/attack-types-stats` | Attack type distribution |
| `GET` | `/api/attack-types` | Paginated attack log |
| `GET` | `/api/raw-request/{id}` | Full HTTP request |
| `GET` | `/api/get_banlist` | Export ban rules |

### HTMX Fragment Routes (`routes/htmx.py`)

Each returns a server-rendered Jinja2 partial (`hx-swap="innerHTML"`):

| Path | Template |
|------|----------|
| `/htmx/honeypot` | `honeypot_table.html` |
| `/htmx/top-ips` | `top_ips_table.html` |
| `/htmx/top-paths` | `top_paths_table.html` |
| `/htmx/top-ua` | `top_ua_table.html` |
| `/htmx/attackers` | `attackers_table.html` |
| `/htmx/credentials` | `credentials_table.html` |
| `/htmx/attacks` | `attack_types_table.html` |
| `/htmx/patterns` | `patterns_table.html` |
| `/htmx/ip-detail/{ip}` | `ip_detail.html` |

## Database Schema

```
┌─────────────────┐     ┌──────────────────┐
│   AccessLog     │     │ AttackDetection   │
├─────────────────┤     ├──────────────────┤
│ id (PK)         │◄────│ access_log_id(FK)│
│ ip (indexed)    │     │ attack_type      │
│ path            │     │ matched_pattern  │
│ user_agent      │     └──────────────────┘
│ method          │
│ is_suspicious   │     ┌──────────────────┐
│ is_honeypot     │     │CredentialAttempt │
│ timestamp       │     ├──────────────────┤
│ raw_request     │     │ id (PK)          │
└─────────────────┘     │ ip (indexed)     │
                        │ path, username   │
┌─────────────────┐     │ password         │
│    IpStats      │     │ timestamp        │
├─────────────────┤     └──────────────────┘
│ ip (PK)         │
│ total_requests  │     ┌──────────────────┐
│ first/last_seen │     │ CategoryHistory  │
│ country_code    │     ├──────────────────┤
│ city, lat, lon  │     │ id (PK)          │
│ asn, asn_org    │     │ ip (indexed)     │
│ isp, reverse    │     │ old_category     │
│ is_proxy        │     │ new_category     │
│ is_hosting      │     │ timestamp        │
│ list_on (JSON)  │     └──────────────────┘
│ category        │
│ category_scores │
│ analyzed_metrics│
│ manual_category │
└─────────────────┘
```

**SQLite config:** WAL mode, 30s busy timeout, file permissions 600.

## Frontend Architecture

```
base.html
  ├── CDN: Leaflet, Chart.js, HTMX, Alpine.js (deferred)
  ├── Static: dashboard.css
  │
  └── dashboard/index.html (extends base)
      │
      ├── Stats cards ──────────── Server-rendered on page load
      ├── Suspicious table ─────── Server-rendered on page load
      │
      ├── Overview tab (Alpine.js x-show)
      │   ├── Honeypot table ───── HTMX hx-get on load
      │   ├── Top IPs table ────── HTMX hx-get on load
      │   ├── Top Paths table ──── HTMX hx-get on load
      │   ├── Top UA table ─────── HTMX hx-get on load
      │   └── Credentials table ── HTMX hx-get on load
      │
      └── Attacks tab (Alpine.js x-show, lazy init)
          ├── Attackers table ──── HTMX hx-get on load
          ├── Map ──────────────── Leaflet (init on tab switch)
          ├── Chart ────────────── Chart.js (init on tab switch)
          ├── Attack types table ─ HTMX hx-get on load
          └── Patterns table ───── HTMX hx-get on load
```

**Responsibility split:**
- **Alpine.js** — Tab state, modals, dropdowns, lazy initialization
- **HTMX** — Table pagination, sorting, IP detail expansion
- **Leaflet** — Interactive map with category-colored markers
- **Chart.js** — Doughnut chart for attack type distribution
- **Custom SVG** — Radar charts for IP category scores

## Background Tasks

Managed by `TasksMaster` (APScheduler). Tasks are auto-discovered from `src/tasks/`.

| Task | Schedule | Purpose |
|------|----------|---------|
| `analyze_ips` | Every 1 min | Score IPs into categories (attacker, crawler, user) |
| `fetch_ip_rep` | Every 5 min | Enrich IPs with geolocation + blocklist data |
| `db_dump` | Configurable | Export database backups |
| `memory_cleanup` | Periodic | Trim in-memory lists |
| `top_attacking_ips` | Periodic | Cache top attackers |

### IP Categorization Model

Each IP is scored across 4 categories based on:
- HTTP method distribution (risky methods ratio)
- Robots.txt violations
- Request timing anomalies (coefficient of variation)
- User-Agent diversity
- Attack URL detection

Categories: `attacker`, `bad_crawler`, `good_crawler`, `regular_user`, `unknown`

## Configuration

`config.yaml` with environment variable overrides (`KRAWL_{FIELD}`):

```yaml
server:
  port: 5000
  delay: 100                    # Response delay (ms)

dashboard:
  secret_path: "test"           # Auto-generates if null

database:
  path: "data/krawl.db"
  retention_days: 30

crawl:
  infinite_pages_for_malicious: true
  max_pages_limit: 250
  ban_duration_seconds: 600

behavior:
  probability_error_codes: 0    # 0-100%

canary:
  token_url: null               # External canary alert URL
```

## Logging

Three rotating log files (1MB max, 5 backups each):

| Logger | File | Content |
|--------|------|---------|
| `krawl.app` | `logs/krawl.log` | Application events, errors |
| `krawl.access` | `logs/access.log` | HTTP access, attack detections |
| `krawl.credentials` | `logs/credentials.log` | Captured login attempts |

## Docker

```dockerfile
FROM python:3.11-slim
# Non-root user: krawl:1000
# Volumes: /app/logs, /app/data, /app/exports
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "5000", "--app-dir", "src"]
```

## Key Data Flows

### Honeypot Request

```
Client → BanCheck → DeceptionMiddleware → HoneypotRouter
                                              │
                                    ┌─────────┴──────────┐
                                    │ tracker.record()    │
                                    │   ├─ in-memory ++   │
                                    │   ├─ detect attacks │
                                    │   └─ DB persist     │
                                    └────────────────────┘
```

### Dashboard Load

```
Browser → GET /{secret}/ → SSR initial stats + Jinja2 render
       → Alpine.js init → HTMX fires hx-get for each table
       → User clicks Attacks tab → setTimeout → init Leaflet + Chart.js
       → Leaflet fetches /api/all-ips → plots markers
       → Chart.js fetches /api/attack-types-stats → renders doughnut
```

### IP Enrichment Pipeline

```
APScheduler (every 5 min)
  └─ fetch_ip_rep.main()
       ├─ DB: get unenriched IPs (limit 50)
       ├─ ip-api.com → geolocation (country, city, ASN, coords)
       ├─ iprep.lcrawl.com → blocklist memberships
       └─ DB: update IpStats with enriched data
```
docs: add architecture documentation for Krawl project 2026-02-17 14:34:48 +01:00			`# Krawl Architecture`

			`## Overview`

			`Krawl is a cloud-native deception honeypot server built on FastAPI. It creates realistic fake web applications (admin panels, login pages, fake credentials) to attract, detect, and analyze malicious crawlers and attackers while wasting their resources with infinite spider-trap pages.`

			`## Tech Stack`

			`\| Layer \| Technology \|`
			`\|-------\|-----------\|`
			`\| Backend \| FastAPI, Uvicorn, Python 3.11 \|`
			`\| ORM / DB \| SQLAlchemy 2.0, SQLite (WAL mode) \|`
			`\| Templating \| Jinja2 (server-side rendering) \|`
			`\| Reactivity \| Alpine.js 3.14 \|`
			`\| Partial Updates \| HTMX 2.0 \|`
			`\| Charts \| Chart.js 3.9 (doughnut), custom SVG radar \|`
			`\| Maps \| Leaflet 1.9 + CartoDB dark tiles \|`
			`\| Scheduling \| APScheduler \|`
			`\| Container \| Docker (python:3.11-slim), Helm/K8s ready \|`

			`## Directory Structure`

			```
			`Krawl/`
			`├── src/`
			`│ ├── app.py # FastAPI app factory + lifespan`
			`│ ├── config.py # YAML + env config loader`
			`│ ├── dependencies.py # DI providers (templates, DB, client IP)`
			`│ ├── database.py # DatabaseManager singleton`
			`│ ├── models.py # SQLAlchemy ORM models`
			`│ ├── tracker.py # In-memory + DB access tracking`
			`│ ├── logger.py # Rotating file log handlers`
			`│ ├── deception_responses.py # Attack detection + fake responses`
			`│ ├── sanitizer.py # Input sanitization`
			`│ ├── generators.py # Random content generators`
			`│ ├── wordlists.py # JSON wordlist loader`
			`│ ├── geo_utils.py # IP geolocation API`
			`│ ├── ip_utils.py # IP validation`
			`│ │`
			`│ ├── routes/`
			`│ │ ├── honeypot.py # Trap pages, credential capture, catch-all`
			`│ │ ├── dashboard.py # Dashboard page (Jinja2 SSR)`
			`│ │ ├── api.py # JSON API endpoints`
			`│ │ └── htmx.py # HTMX HTML fragment endpoints`
			`│ │`
			`│ ├── middleware/`
			`│ │ ├── deception.py # Path traversal / XXE / cmd injection detection`
			`│ │ └── ban_check.py # Banned IP enforcement`
			`│ │`
			`│ ├── tasks/ # APScheduler background jobs`
			`│ │ ├── analyze_ips.py # IP categorization scoring`
			`│ │ ├── fetch_ip_rep.py # Geolocation + blocklist enrichment`
			`│ │ ├── db_dump.py # Database export`
			`│ │ ├── memory_cleanup.py # In-memory list trimming`
			`│ │ └── top_attacking_ips.py # Top attacker caching`
			`│ │`
			`│ ├── tasks_master.py # Task discovery + APScheduler orchestrator`
			`│ ├── firewall/ # Banlist export (iptables, raw)`
			`│ ├── migrations/ # Schema migrations (auto-run)`
			`│ │`
			`│ └── templates/`
			`│ ├── jinja2/`
			`│ │ ├── base.html # Layout + CDN scripts`
			`│ │ └── dashboard/`
			`│ │ ├── index.html # Main dashboard page`
			`│ │ └── partials/ # 13 HTMX fragment templates`
			`│ ├── html/ # Deceptive trap page templates`
			`│ └── static/`
			`│ ├── css/dashboard.css`
			`│ └── js/`
			`│ ├── dashboard.js # Alpine.js app controller`
			`│ ├── map.js # Leaflet map`
			`│ ├── charts.js # Chart.js doughnut`
			`│ └── radar.js # SVG radar chart`
			`│`
			`├── config.yaml # Application configuration`
			`├── wordlists.json # Attack patterns + fake credentials`
			`├── Dockerfile # Container build`
			`├── docker-compose.yaml # Local orchestration`
			`├── entrypoint.sh # Container startup (gosu privilege drop)`
			`├── kubernetes/ # K8s manifests`
			`└── helm/ # Helm chart`
			```

			`## Application Entry Point`

			`src/app.py` uses the FastAPI application factory pattern with an async lifespan manager:

			```
			`Startup Shutdown`
			`│ │`
			`├─ Initialize logging └─ Log shutdown`
			`├─ Initialize SQLite DB`
			`├─ Create AccessTracker`
			`├─ Load webpages file (optional)`
			`├─ Store config + tracker in app.state`
			`├─ Start APScheduler background tasks`
			`└─ Log dashboard URL`
			```

			`## Request Pipeline`

			```
			`Request`
			`│`
			`▼`
			`┌──────────────────────┐`
			`│ BanCheckMiddleware │──→ IP banned? → Return 500`
			`└──────────┬───────────┘`
			`▼`
			`┌──────────────────────┐`
			`│ DeceptionMiddleware │──→ Attack detected? → Fake error response`
			`└──────────┬───────────┘`
			`▼`
			`┌───────────────────────┐`
			`│ ServerHeaderMiddleware│──→ Add random Server header`
			`└──────────┬────────────┘`
			`▼`
			`┌───────────────────────┐`
			`│ Route Matching │`
			`│ (ordered by priority)│`
			`│ │`
			`│ 1. Static files │ /{secret}/static/*`
			`│ 2. Dashboard router │ /{secret}/ (prefix-based)`
			`│ 3. API router │ /{secret}/api/* (prefix-based)`
			`│ 4. HTMX router │ /{secret}/htmx/* (prefix-based)`
			`│ 5. Honeypot router │ /* (catch-all)`
			`└───────────────────────┘`
			```

			`### Prefix-Based Routing`

			Dashboard, API, and HTMX routers are mounted with `prefix=f"/{secret}"` in `app.py`. This means:
			- Route handlers define paths without the secret (e.g., `@router.get("/api/all-ips")`)
			- FastAPI prepends the secret automatically (e.g., `GET /a1b2c3/api/all-ips`)
			- The honeypot catch-all `/{path:path}` only matches paths that don't start with the secret
			- No `_is_dashboard_path()` checks needed — the prefix handles access scoping

			`## Route Architecture`

			### Honeypot Routes (`routes/honeypot.py`)

			`\| Method \| Path \| Response \|`
			`\|--------\|------\|----------\|`
			\| `GET` \| `/{path:path}` \| Trap page with random links (catch-all) \|
			\| `HEAD` \| `/{path:path}` \| 200 OK \|
			\| `POST` \| `/{path:path}` \| Credential capture \|
			\| `GET` \| `/admin`, `/login` \| Fake login form \|
			\| `GET` \| `/wp-admin`, `/wp-login.php` \| Fake WordPress login \|
			\| `GET` \| `/phpmyadmin` \| Fake phpMyAdmin \|
			\| `GET` \| `/robots.txt` \| Honeypot paths advertised \|
			\| `GET/POST` \| `/api/search`, `/api/sql` \| SQL injection honeypot \|
			\| `POST` \| `/api/contact` \| XSS detection endpoint \|
			\| `GET` \| `/.env`, `/credentials.txt` \| Fake sensitive files \|

			### Dashboard Routes (`routes/dashboard.py`)

			`\| Method \| Path \| Response \|`
			`\|--------\|------\|----------\|`
			\| `GET` \| `/` \| Server-rendered dashboard (Jinja2) \|

			### API Routes (`routes/api.py`)

			`\| Method \| Path \| Response \|`
			`\|--------\|------\|----------\|`
			\| `GET` \| `/api/all-ips` \| Paginated IP list with stats \|
			\| `GET` \| `/api/attackers` \| Paginated attacker IPs \|
			\| `GET` \| `/api/ip-stats/{ip}` \| Single IP detail \|
			\| `GET` \| `/api/credentials` \| Captured credentials \|
			\| `GET` \| `/api/honeypot` \| Honeypot trigger counts \|
			\| `GET` \| `/api/top-ips` \| Top requesting IPs \|
			\| `GET` \| `/api/top-paths` \| Most requested paths \|
			\| `GET` \| `/api/top-user-agents` \| Top user agents \|
			\| `GET` \| `/api/attack-types-stats` \| Attack type distribution \|
			\| `GET` \| `/api/attack-types` \| Paginated attack log \|
			\| `GET` \| `/api/raw-request/{id}` \| Full HTTP request \|
			\| `GET` \| `/api/get_banlist` \| Export ban rules \|

			### HTMX Fragment Routes (`routes/htmx.py`)

			Each returns a server-rendered Jinja2 partial (`hx-swap="innerHTML"`):

			`\| Path \| Template \|`
			`\|------\|----------\|`
			\| `/htmx/honeypot` \| `honeypot_table.html` \|
			\| `/htmx/top-ips` \| `top_ips_table.html` \|
			\| `/htmx/top-paths` \| `top_paths_table.html` \|
			\| `/htmx/top-ua` \| `top_ua_table.html` \|
			\| `/htmx/attackers` \| `attackers_table.html` \|
			\| `/htmx/credentials` \| `credentials_table.html` \|
			\| `/htmx/attacks` \| `attack_types_table.html` \|
			\| `/htmx/patterns` \| `patterns_table.html` \|
			\| `/htmx/ip-detail/{ip}` \| `ip_detail.html` \|

			`## Database Schema`

			```
			`┌─────────────────┐ ┌──────────────────┐`
			`│ AccessLog │ │ AttackDetection │`
			`├─────────────────┤ ├──────────────────┤`
			`│ id (PK) │◄────│ access_log_id(FK)│`
			`│ ip (indexed) │ │ attack_type │`
			`│ path │ │ matched_pattern │`
			`│ user_agent │ └──────────────────┘`
			`│ method │`
			`│ is_suspicious │ ┌──────────────────┐`
			`│ is_honeypot │ │CredentialAttempt │`
			`│ timestamp │ ├──────────────────┤`
			`│ raw_request │ │ id (PK) │`
			`└─────────────────┘ │ ip (indexed) │`
			`│ path, username │`
			`┌─────────────────┐ │ password │`
			`│ IpStats │ │ timestamp │`
			`├─────────────────┤ └──────────────────┘`
			`│ ip (PK) │`
			`│ total_requests │ ┌──────────────────┐`
			`│ first/last_seen │ │ CategoryHistory │`
			`│ country_code │ ├──────────────────┤`
			`│ city, lat, lon │ │ id (PK) │`
			`│ asn, asn_org │ │ ip (indexed) │`
			`│ isp, reverse │ │ old_category │`
			`│ is_proxy │ │ new_category │`
			`│ is_hosting │ │ timestamp │`
			`│ list_on (JSON) │ └──────────────────┘`
			`│ category │`
			`│ category_scores │`
			`│ analyzed_metrics│`
			`│ manual_category │`
			`└─────────────────┘`
			```

			`SQLite config: WAL mode, 30s busy timeout, file permissions 600.`

			`## Frontend Architecture`

			```
			`base.html`
			`├── CDN: Leaflet, Chart.js, HTMX, Alpine.js (deferred)`
			`├── Static: dashboard.css`
			`│`
			`└── dashboard/index.html (extends base)`
			`│`
			`├── Stats cards ──────────── Server-rendered on page load`
			`├── Suspicious table ─────── Server-rendered on page load`
			`│`
			`├── Overview tab (Alpine.js x-show)`
			`│ ├── Honeypot table ───── HTMX hx-get on load`
			`│ ├── Top IPs table ────── HTMX hx-get on load`
			`│ ├── Top Paths table ──── HTMX hx-get on load`
			`│ ├── Top UA table ─────── HTMX hx-get on load`
			`│ └── Credentials table ── HTMX hx-get on load`
			`│`
			`└── Attacks tab (Alpine.js x-show, lazy init)`
			`├── Attackers table ──── HTMX hx-get on load`
			`├── Map ──────────────── Leaflet (init on tab switch)`
			`├── Chart ────────────── Chart.js (init on tab switch)`
			`├── Attack types table ─ HTMX hx-get on load`
			`└── Patterns table ───── HTMX hx-get on load`
			```

			`Responsibility split:`
			`- Alpine.js — Tab state, modals, dropdowns, lazy initialization`
			`- HTMX — Table pagination, sorting, IP detail expansion`
			`- Leaflet — Interactive map with category-colored markers`
			`- Chart.js — Doughnut chart for attack type distribution`
			`- Custom SVG — Radar charts for IP category scores`

			`## Background Tasks`

			Managed by `TasksMaster` (APScheduler). Tasks are auto-discovered from `src/tasks/`.

			`\| Task \| Schedule \| Purpose \|`
			`\|------\|----------\|---------\|`
			\| `analyze_ips` \| Every 1 min \| Score IPs into categories (attacker, crawler, user) \|
			\| `fetch_ip_rep` \| Every 5 min \| Enrich IPs with geolocation + blocklist data \|
			\| `db_dump` \| Configurable \| Export database backups \|
			\| `memory_cleanup` \| Periodic \| Trim in-memory lists \|
			\| `top_attacking_ips` \| Periodic \| Cache top attackers \|

			`### IP Categorization Model`

			`Each IP is scored across 4 categories based on:`
			`- HTTP method distribution (risky methods ratio)`
			`- Robots.txt violations`
			`- Request timing anomalies (coefficient of variation)`
			`- User-Agent diversity`
			`- Attack URL detection`

			Categories: `attacker`, `bad_crawler`, `good_crawler`, `regular_user`, `unknown`

			`## Configuration`

			`config.yaml` with environment variable overrides (`KRAWL_{FIELD}`):

			```yaml
			`server:`
			`port: 5000`
			`delay: 100 # Response delay (ms)`

			`dashboard:`
			`secret_path: "test" # Auto-generates if null`

			`database:`
			`path: "data/krawl.db"`
			`retention_days: 30`

			`crawl:`
			`infinite_pages_for_malicious: true`
			`max_pages_limit: 250`
			`ban_duration_seconds: 600`

			`behavior:`
			`probability_error_codes: 0 # 0-100%`

			`canary:`
			`token_url: null # External canary alert URL`
			```

			`## Logging`

			`Three rotating log files (1MB max, 5 backups each):`

			`\| Logger \| File \| Content \|`
			`\|--------\|------\|---------\|`
			\| `krawl.app` \| `logs/krawl.log` \| Application events, errors \|
			\| `krawl.access` \| `logs/access.log` \| HTTP access, attack detections \|
			\| `krawl.credentials` \| `logs/credentials.log` \| Captured login attempts \|

			`## Docker`

			```dockerfile
			`FROM python:3.11-slim`
			`# Non-root user: krawl:1000`
			`# Volumes: /app/logs, /app/data, /app/exports`
			`CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "5000", "--app-dir", "src"]`
			```

			`## Key Data Flows`

			`### Honeypot Request`

			```
			`Client → BanCheck → DeceptionMiddleware → HoneypotRouter`
			`│`
			`┌─────────┴──────────┐`
			`│ tracker.record() │`
			`│ ├─ in-memory ++ │`
			`│ ├─ detect attacks │`
			`│ └─ DB persist │`
			`└────────────────────┘`
			```

			`### Dashboard Load`

			```
			`Browser → GET /{secret}/ → SSR initial stats + Jinja2 render`
			`→ Alpine.js init → HTMX fires hx-get for each table`
			`→ User clicks Attacks tab → setTimeout → init Leaflet + Chart.js`
			`→ Leaflet fetches /api/all-ips → plots markers`
			`→ Chart.js fetches /api/attack-types-stats → renders doughnut`
			```

			`### IP Enrichment Pipeline`

			```
			`APScheduler (every 5 min)`
			`└─ fetch_ip_rep.main()`
			`├─ DB: get unenriched IPs (limit 50)`
			`├─ ip-api.com → geolocation (country, city, ASN, coords)`
			`├─ iprep.lcrawl.com → blocklist memberships`
			`└─ DB: update IpStats with enriched data`
			```