Merge pull request #111 from BlessedRebuS/dev

Feat/release 1.1.0
This commit is contained in:
Lorenzo Venerandi
2026-03-01 22:09:10 +01:00
committed by GitHub
123 changed files with 10936 additions and 5760 deletions

View File

@@ -20,7 +20,7 @@ jobs:
- uses: actions/setup-python@v5
with:
python-version: '3.11'
python-version: '3.13'
cache: 'pip'
- name: Install dependencies

View File

@@ -19,7 +19,7 @@ jobs:
- uses: actions/setup-python@v5
with:
python-version: '3.11'
python-version: '3.13'
cache: 'pip'
- name: Install dependencies
@@ -48,12 +48,4 @@ jobs:
- name: Safety check for dependencies
run: safety check --json || true
- name: Trivy vulnerability scan
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
format: 'table'
severity: 'CRITICAL,HIGH'
exit-code: '1'

6
.gitignore vendored
View File

@@ -68,6 +68,7 @@ data/
*.db
*.sqlite
*.sqlite3
backups/
# Temporary files
*.tmp
@@ -80,4 +81,7 @@ personal-values.yaml
#exports dir (keeping .gitkeep so we have the dir)
/exports/*
/src/exports/*
/src/exports/*
# tmux config
.tmux.conf

View File

@@ -1,15 +1,16 @@
FROM python:3.11-slim
FROM python:3.13-slim
LABEL org.opencontainers.image.source=https://github.com/BlessedRebuS/Krawl
WORKDIR /app
# Install gosu for dropping privileges
RUN apt-get update && apt-get install -y --no-install-recommends gosu && \
RUN apt-get update && apt-get upgrade -y && apt-get install -y --no-install-recommends gosu && \
rm -rf /var/lib/apt/lists/*
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
COPY src/ /app/src/
COPY wordlists.json /app/
@@ -26,4 +27,4 @@ EXPOSE 5000
ENV PYTHONUNBUFFERED=1
ENTRYPOINT ["/app/entrypoint.sh"]
CMD ["python3", "src/server.py"]
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "5000", "--app-dir", "src"]

340
README.md
View File

@@ -33,21 +33,25 @@
<img src="https://img.shields.io/badge/helm-chart-0F1689?logo=helm&logoColor=white" alt="Helm Chart">
</a>
</div>
<br>
<p align="center">
<a href="#what-is-krawl">What is Krawl?</a> •
<a href="#-installation">Installation</a> •
<a href="#honeypot-pages">Honeypot Pages</a> •
<a href="#dashboard">Dashboard</a> •
<a href="./ToDo.md">Todo</a> •
<a href="#-contributing">Contributing</a>
</p>
<br>
</div>
## Table of Contents
- [Demo](#demo)
- [What is Krawl?](#what-is-krawl)
- [Krawl Dashboard](#krawl-dashboard)
- [Installation](#-installation)
- [Docker Run](#docker-run)
- [Docker Compose](#docker-compose)
- [Kubernetes](#kubernetes)
- [Configuration](#configuration)
- [config.yaml](#configuration-via-configyaml)
- [Environment Variables](#configuration-via-enviromental-variables)
- [Ban Malicious IPs](#use-krawl-to-ban-malicious-ips)
- [IP Reputation](#ip-reputation)
- [Forward Server Header](#forward-server-header)
- [Additional Documentation](#additional-documentation)
- [Contributing](#-contributing)
## Demo
Tip: crawl the `robots.txt` paths for additional fun
### Krawl URL: [http://demo.krawlme.com](http://demo.krawlme.com)
@@ -67,7 +71,7 @@ It features:
- **Fake Login Pages**: WordPress, phpMyAdmin, admin panels
- **Honeypot Paths**: Advertised in robots.txt to catch scanners
- **Fake Credentials**: Realistic-looking usernames, passwords, API keys
- **[Canary Token](#customizing-the-canary-token) Integration**: External alert triggering
- **[Canary Token](docs/canary-token.md) Integration**: External alert triggering
- **Random server headers**: Confuse attacks based on server header and version
- **Real-time Dashboard**: Monitor suspicious activity
- **Customizable Wordlists**: Easy JSON-based configuration
@@ -75,8 +79,28 @@ It features:
![dashboard](img/deception-page.png)
## Krawl Dashboard
Krawl provides a comprehensive dashboard, accessible at a **random secret path** generated at startup or at a **custom path** configured via `KRAWL_DASHBOARD_SECRET_PATH`. This keeps the dashboard hidden from attackers scanning your honeypot.
The dashboard is organized in three main tabs:
- **Overview** — High-level view of attack activity: an interactive map of IP origins, recent suspicious requests, and top IPs, User-Agents, and paths.
![geoip](img/geoip_dashboard.png)
- **Attacks** — Detailed breakdown of captured credentials, honeypot triggers, and detected attack types (SQLi, XSS, path traversal, etc.) with charts and tables.
![attack_types](img/attack_types.png)
- **IP Insight** — In-depth forensic view of a selected IP: geolocation, ISP/ASN info, reputation flags, behavioral timeline, attack type distribution, and full access history.
![ipinsight](img/ip_insight_dashboard.png)
For more details, see the [Dashboard documentation](docs/dashboard.md).
## 🚀 Installation
### Docker Run
@@ -89,7 +113,7 @@ docker run -d \
-e KRAWL_PORT=5000 \
-e KRAWL_DELAY=100 \
-e KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard" \
-e KRAWL_DATABASE_RETENTION_DAYS=30 \
-v krawl-data:/app/data \
--name krawl \
ghcr.io/blessedrebus/krawl:latest
```
@@ -112,6 +136,8 @@ services:
- TZ=Europe/Rome
volumes:
- ./config.yaml:/app/config.yaml:ro
# bind mount for firewall exporters
- ./exports:/app/exports
- krawl-data:/app/data
restart: unless-stopped
@@ -134,6 +160,75 @@ docker-compose down
### Kubernetes
**Krawl is also available natively on Kubernetes**. Installation can be done either [via manifest](kubernetes/README.md) or [using the helm chart](helm/README.md).
## Configuration
Krawl uses a **configuration hierarchy** in which **environment variables take precedence over the configuration file**. This approach is recommended for Docker deployments and quick out-of-the-box customization.
### Configuration via config.yaml
You can use the [config.yaml](config.yaml) file for advanced configurations, such as Docker Compose or Helm chart deployments.
### Configuration via Enviromental Variables
| Environment Variable | Description | Default |
|----------------------|-------------|---------|
| `CONFIG_LOCATION` | Path to yaml config file | `config.yaml` |
| `KRAWL_PORT` | Server listening port | `5000` |
| `KRAWL_DELAY` | Response delay in milliseconds | `100` |
| `KRAWL_SERVER_HEADER` | HTTP Server header for deception | `""` |
| `KRAWL_LINKS_LENGTH_RANGE` | Link length range as `min,max` | `5,15` |
| `KRAWL_LINKS_PER_PAGE_RANGE` | Links per page as `min,max` | `10,15` |
| `KRAWL_CHAR_SPACE` | Characters used for link generation | `abcdefgh...` |
| `KRAWL_MAX_COUNTER` | Initial counter value | `10` |
| `KRAWL_CANARY_TOKEN_URL` | External canary token URL | None |
| `KRAWL_CANARY_TOKEN_TRIES` | Requests before showing canary token | `10` |
| `KRAWL_DASHBOARD_SECRET_PATH` | Custom dashboard path | Auto-generated |
| `KRAWL_PROBABILITY_ERROR_CODES` | Error response probability (0-100%) | `0` |
| `KRAWL_DATABASE_PATH` | Database file location | `data/krawl.db` |
| `KRAWL_EXPORTS_PATH` | Path where firewalls rule sets are exported | `exports` |
| `KRAWL_BACKUPS_PATH` | Path where database dump are saved | `backups` |
| `KRAWL_BACKUPS_CRON` | cron expression to control backup job schedule | `*/30 * * * *` |
| `KRAWL_BACKUPS_ENABLED` | Boolean to enable db dump job | `true` |
| `KRAWL_DATABASE_RETENTION_DAYS` | Days to retain data in database | `30` |
| `KRAWL_HTTP_RISKY_METHODS_THRESHOLD` | Threshold for risky HTTP methods detection | `0.1` |
| `KRAWL_VIOLATED_ROBOTS_THRESHOLD` | Threshold for robots.txt violations | `0.1` |
| `KRAWL_UNEVEN_REQUEST_TIMING_THRESHOLD` | Coefficient of variation threshold for timing | `0.5` |
| `KRAWL_UNEVEN_REQUEST_TIMING_TIME_WINDOW_SECONDS` | Time window for request timing analysis in seconds | `300` |
| `KRAWL_USER_AGENTS_USED_THRESHOLD` | Threshold for detecting multiple user agents | `2` |
| `KRAWL_ATTACK_URLS_THRESHOLD` | Threshold for attack URL detection | `1` |
| `KRAWL_INFINITE_PAGES_FOR_MALICIOUS` | Serve infinite pages to malicious IPs | `true` |
| `KRAWL_MAX_PAGES_LIMIT` | Maximum page limit for crawlers | `250` |
| `KRAWL_BAN_DURATION_SECONDS` | Ban duration in seconds for rate-limited IPs | `600` |
For example
```bash
# Set canary token
export CONFIG_LOCATION="config.yaml"
export KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url"
# Set number of pages range (min,max format)
export KRAWL_LINKS_PER_PAGE_RANGE="5,25"
# Set analyzer thresholds
export KRAWL_HTTP_RISKY_METHODS_THRESHOLD="0.2"
export KRAWL_VIOLATED_ROBOTS_THRESHOLD="0.15"
# Set custom dashboard path
export KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard"
```
Example of a Docker run with env variables:
```bash
docker run -d \
-p 5000:5000 \
-e KRAWL_PORT=5000 \
-e KRAWL_DELAY=100 \
-e KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url" \
--name krawl \
ghcr.io/blessedrebus/krawl:latest
```
## Use Krawl to Ban Malicious IPs
Krawl uses a reputation-based system to classify attacker IP addresses. Every five minutes, Krawl exports the identified malicious IPs to a `malicious_ips.txt` file.
@@ -143,7 +238,11 @@ This file can either be mounted from the Docker container into another system or
curl https://your-krawl-instance/<DASHBOARD-PATH>/api/download/malicious_ips.txt
```
This file can be used to [update a set of firewall rules](https://www.allthingstech.ch/using-opnsense-and-ip-blocklists-to-block-malicious-traffic), for example on OPNsense and pfSense, enabling automatic blocking of malicious IPs or using IPtables
This file enables automatic blocking of malicious traffic across various platforms. You can use it to update firewall rules on:
* [OPNsense and pfSense](https://www.allthingstech.ch/using-opnsense-and-ip-blocklists-to-block-malicious-traffic)
* [RouterOS](https://rentry.co/krawl-routeros)
* [IPtables](plugins/iptables/README.md) and [Nftables](plugins/nftables/README.md)
* [Fail2Ban](plugins/fail2ban/README.md)
## IP Reputation
Krawl [uses tasks that analyze recent traffic to build and continuously update an IP reputation](src/tasks/analyze_ips.py) score. It runs periodically and evaluates each active IP address based on multiple behavioral indicators to classify it as an attacker, crawler, or regular user. Thresholds are fully customizable.
@@ -176,194 +275,17 @@ location / {
}
```
## API
Krawl uses the following APIs
- https://iprep.lcrawl.com (IP Reputation)
- https://nominatim.openstreetmap.org/reverse (Reverse IP Lookup)
- https://api.ipify.org (Public IP discovery)
- http://ident.me (Public IP discovery)
- https://ifconfig.me (Public IP discovery)
## Additional Documentation
## Configuration
Krawl uses a **configuration hierarchy** in which **environment variables take precedence over the configuration file**. This approach is recommended for Docker deployments and quick out-of-the-box customization.
### Configuration via Enviromental Variables
| Environment Variable | Description | Default |
|----------------------|-------------|---------|
| `CONFIG_LOCATION` | Path to yaml config file | `config.yaml` |
| `KRAWL_PORT` | Server listening port | `5000` |
| `KRAWL_DELAY` | Response delay in milliseconds | `100` |
| `KRAWL_SERVER_HEADER` | HTTP Server header for deception | `""` |
| `KRAWL_LINKS_LENGTH_RANGE` | Link length range as `min,max` | `5,15` |
| `KRAWL_LINKS_PER_PAGE_RANGE` | Links per page as `min,max` | `10,15` |
| `KRAWL_CHAR_SPACE` | Characters used for link generation | `abcdefgh...` |
| `KRAWL_MAX_COUNTER` | Initial counter value | `10` |
| `KRAWL_CANARY_TOKEN_URL` | External canary token URL | None |
| `KRAWL_CANARY_TOKEN_TRIES` | Requests before showing canary token | `10` |
| `KRAWL_DASHBOARD_SECRET_PATH` | Custom dashboard path | Auto-generated |
| `KRAWL_PROBABILITY_ERROR_CODES` | Error response probability (0-100%) | `0` |
| `KRAWL_DATABASE_PATH` | Database file location | `data/krawl.db` |
| `KRAWL_DATABASE_RETENTION_DAYS` | Days to retain data in database | `30` |
| `KRAWL_HTTP_RISKY_METHODS_THRESHOLD` | Threshold for risky HTTP methods detection | `0.1` |
| `KRAWL_VIOLATED_ROBOTS_THRESHOLD` | Threshold for robots.txt violations | `0.1` |
| `KRAWL_UNEVEN_REQUEST_TIMING_THRESHOLD` | Coefficient of variation threshold for timing | `0.5` |
| `KRAWL_UNEVEN_REQUEST_TIMING_TIME_WINDOW_SECONDS` | Time window for request timing analysis in seconds | `300` |
| `KRAWL_USER_AGENTS_USED_THRESHOLD` | Threshold for detecting multiple user agents | `2` |
| `KRAWL_ATTACK_URLS_THRESHOLD` | Threshold for attack URL detection | `1` |
| `KRAWL_INFINITE_PAGES_FOR_MALICIOUS` | Serve infinite pages to malicious IPs | `true` |
| `KRAWL_MAX_PAGES_LIMIT` | Maximum page limit for crawlers | `250` |
| `KRAWL_BAN_DURATION_SECONDS` | Ban duration in seconds for rate-limited IPs | `600` |
For example
```bash
# Set canary token
export CONFIG_LOCATION="config.yaml"
export KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url"
# Set number of pages range (min,max format)
export KRAWL_LINKS_PER_PAGE_RANGE="5,25"
# Set analyzer thresholds
export KRAWL_HTTP_RISKY_METHODS_THRESHOLD="0.2"
export KRAWL_VIOLATED_ROBOTS_THRESHOLD="0.15"
# Set custom dashboard path
export KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard"
```
Example of a Docker run with env variables:
```bash
docker run -d \
-p 5000:5000 \
-e KRAWL_PORT=5000 \
-e KRAWL_DELAY=100 \
-e KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url" \
--name krawl \
ghcr.io/blessedrebus/krawl:latest
```
### Configuration via config.yaml
You can use the [config.yaml](config.yaml) file for more advanced configurations, such as Docker Compose or Helm chart deployments.
# Honeypot
Below is a complete overview of the Krawl honeypots capabilities
## robots.txt
The actual (juicy) robots.txt configuration [is the following](src/templates/html/robots.txt).
## Honeypot pages
Requests to common admin endpoints (`/admin/`, `/wp-admin/`, `/phpMyAdmin/`) return a fake login page. Any login attempt triggers a 1-second delay to simulate real processing and is fully logged in the dashboard (credentials, IP, headers, timing).
![admin page](img/admin-page.png)
Requests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/uploads/` return a fake directory listing populated with “interesting” files, each assigned a random file size to look realistic.
![directory-page](img/directory-page.png)
The `.env` endpoint exposes fake database connection strings, **AWS API keys**, and **Stripe secrets**. It intentionally returns an error due to the `Content-Type` being `application/json` instead of plain text, mimicking a “juicy” misconfiguration that crawlers and scanners often flag as information leakage.
The `/server` page displays randomly generated fake error information for each known server.
![server and env page](img/server-and-env-page.png)
The pages `/api/v1/users` and `/api/v2/secrets` show fake users and random secrets in JSON format
![users and secrets](img/users-and-secrets.png)
The pages `/credentials.txt` and `/passwords.txt` show fake users and random secrets
![credentials and passwords](img/credentials-and-passwords.png)
Pages such as `/users`, `/search`, `/contact`, `/info`, `/input`, and `/feedback`, along with APIs like `/api/sql` and `/api/database`, are designed to lure attackers into performing attacks such as **SQL injection** or **XSS**.
![sql injection](img/sql_injection.png)
Automated tools like **SQLMap** will receive a different randomized database error on each request, increasing scan noise and confusing the attacker. All detected attacks are logged and displayed in the dashboard.
## Example usage behind reverse proxy
You can configure a reverse proxy so all web requests land on the Krawl page by default, and hide your real content behind a secret hidden url. For example:
```bash
location / {
proxy_pass https://your-krawl-instance;
proxy_pass_header Server;
}
location /my-hidden-service {
proxy_pass https://my-hidden-service;
proxy_pass_header Server;
}
```
Alternatively, you can create a bunch of different "interesting" looking domains. For example:
- admin.example.com
- portal.example.com
- sso.example.com
- login.example.com
- ...
Additionally, you may configure your reverse proxy to forward all non-existing subdomains (e.g. nonexistent.example.com) to one of these domains so that any crawlers that are guessing domains at random will automatically end up at your Krawl instance.
## Customizing the Canary Token
To create a custom canary token, visit https://canarytokens.org
and generate a “Web bug” canary token.
This optional token is triggered when a crawler fully traverses the webpage until it reaches 0. At that point, a URL is returned. When this URL is requested, it sends an alert to the user via email, including the visitors IP address and user agent.
To enable this feature, set the canary token URL [using the environment variable](#configuration-via-environment-variables) `KRAWL_CANARY_TOKEN_URL`.
## Customizing the wordlist
Edit `wordlists.json` to customize fake data for your use case
```json
{
"usernames": {
"prefixes": ["admin", "root", "user"],
"suffixes": ["_prod", "_dev", "123"]
},
"passwords": {
"prefixes": ["P@ssw0rd", "Admin"],
"simple": ["test", "password"]
},
"directory_listing": {
"files": ["credentials.txt", "backup.sql"],
"directories": ["admin/", "backup/"]
}
}
```
or **values.yaml** in the case of helm chart installation
## Dashboard
Access the dashboard at `http://<server-ip>:<port>/<dashboard-path>`
The dashboard shows:
- Total and unique accesses
- Suspicious activity and attack detection
- Top IPs, paths, user-agents and GeoIP localization
- Real-time monitoring
The attackers access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged.
Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website.
![dashboard-1](img/dashboard-1.png)
The top IP Addresses is shown along with top paths and User Agents
![dashboard-2](img/dashboard-2.png)
![dashboard-3](img/dashboard-3.png)
| Topic | Description |
|-------|-------------|
| [API](docs/api.md) | External APIs used by Krawl for IP data, reputation, and geolocation |
| [Honeypot](docs/honeypot.md) | Full overview of honeypot pages: fake logins, directory listings, credential files, SQLi/XSS/XXE/command injection traps, and more |
| [Reverse Proxy](docs/reverse-proxy.md) | How to deploy Krawl behind NGINX or use decoy subdomains |
| [Database Backups](docs/backups.md) | Enable and configure the automatic database dump job |
| [Canary Token](docs/canary-token.md) | Set up external alert triggers via canarytokens.org |
| [Wordlist](docs/wordlist.md) | Customize fake usernames, passwords, and directory listings |
| [Dashboard](docs/dashboard.md) | Access and explore the real-time monitoring dashboard |
## 🤝 Contributing
@@ -374,13 +296,9 @@ Contributions welcome! Please:
4. Submit a pull request (explain the changes!)
<div align="center">
## ⚠️ Disclaimer
**This is a deception/honeypot system.**
Deploy in isolated environments and monitor carefully for security events.
Use responsibly and in compliance with applicable laws and regulations.
## Disclaimer
> [!CAUTION]
> This is a deception/honeypot system. Deploy in isolated environments and monitor carefully for security events. Use responsibly and in compliance with applicable laws and regulations.
## Star History
<img src="https://api.star-history.com/svg?repos=BlessedRebuS/Krawl&type=Date" width="600" alt="Star History Chart" />
<img src="https://api.star-history.com/svg?repos=BlessedRebuS/Krawl&type=Date" width="600" alt="Star History Chart" />

View File

@@ -1,5 +0,0 @@
# Krawl - Todo List
- Add Prometheus exporter for metrics
- Add POST cresentials information (eg: username and password used)
- Add CloudFlare error pages

View File

@@ -23,7 +23,18 @@ dashboard:
# if set to "null" this will Auto-generates random path if not set
# can be set to "/dashboard" or similar <-- note this MUST include a forward slash
# secret_path: super-secret-dashboard-path
secret_path: test
secret_path: null
backups:
path: "backups"
cron: "*/30 * * * *"
enabled: false
exports:
path: "exports"
logging:
level: "DEBUG" # DEBUG, INFO, WARNING, ERROR, CRITICAL
database:
path: "data/krawl.db"
@@ -43,4 +54,4 @@ analyzer:
crawl:
infinite_pages_for_malicious: true
max_pages_limit: 250
ban_duration_seconds: 600
ban_duration_seconds: 600

View File

@@ -1,4 +1,5 @@
---
# THIS IS FOR DEVELOPMENT PURPOSES
services:
krawl:
build:
@@ -16,17 +17,14 @@ services:
- ./config.yaml:/app/config.yaml:ro
- ./logs:/app/logs
- ./exports:/app/exports
- data:/app/data
- ./data:/app/data
- ./backups:/app/backups
restart: unless-stopped
develop:
watch:
- path: ./Dockerfile
action: rebuild
- path: ./src/
action: sync+restart
target: /app/src
action: rebuild
- path: ./docker-compose.yaml
action: rebuild
volumes:
data:

9
docs/api.md Normal file
View File

@@ -0,0 +1,9 @@
# API
Krawl uses the following APIs
- http://ip-api.com (IP Data)
- https://iprep.lcrawl.com (IP Reputation)
- https://nominatim.openstreetmap.org/reverse (Reverse IP Lookup)
- https://api.ipify.org (Public IP discovery)
- http://ident.me (Public IP discovery)
- https://ifconfig.me (Public IP discovery)

372
docs/architecture.md Normal file
View File

@@ -0,0 +1,372 @@
# Krawl Architecture
## Overview
Krawl is a cloud-native deception honeypot server built on **FastAPI**. It creates realistic fake web applications (admin panels, login pages, fake credentials) to attract, detect, and analyze malicious crawlers and attackers while wasting their resources with infinite spider-trap pages.
## Tech Stack
| Layer | Technology |
|-------|-----------|
| **Backend** | FastAPI, Uvicorn, Python 3.11 |
| **ORM / DB** | SQLAlchemy 2.0, SQLite (WAL mode) |
| **Templating** | Jinja2 (server-side rendering) |
| **Reactivity** | Alpine.js 3.14 |
| **Partial Updates** | HTMX 2.0 |
| **Charts** | Chart.js 3.9 (doughnut), custom SVG radar |
| **Maps** | Leaflet 1.9 + CartoDB dark tiles |
| **Scheduling** | APScheduler |
| **Container** | Docker (python:3.11-slim), Helm/K8s ready |
## Directory Structure
```
Krawl/
├── src/
│ ├── app.py # FastAPI app factory + lifespan
│ ├── config.py # YAML + env config loader
│ ├── dependencies.py # DI providers (templates, DB, client IP)
│ ├── database.py # DatabaseManager singleton
│ ├── models.py # SQLAlchemy ORM models
│ ├── tracker.py # In-memory + DB access tracking
│ ├── logger.py # Rotating file log handlers
│ ├── deception_responses.py # Attack detection + fake responses
│ ├── sanitizer.py # Input sanitization
│ ├── generators.py # Random content generators
│ ├── wordlists.py # JSON wordlist loader
│ ├── geo_utils.py # IP geolocation API
│ ├── ip_utils.py # IP validation
│ │
│ ├── routes/
│ │ ├── honeypot.py # Trap pages, credential capture, catch-all
│ │ ├── dashboard.py # Dashboard page (Jinja2 SSR)
│ │ ├── api.py # JSON API endpoints
│ │ └── htmx.py # HTMX HTML fragment endpoints
│ │
│ ├── middleware/
│ │ ├── deception.py # Path traversal / XXE / cmd injection detection
│ │ └── ban_check.py # Banned IP enforcement
│ │
│ ├── tasks/ # APScheduler background jobs
│ │ ├── analyze_ips.py # IP categorization scoring
│ │ ├── fetch_ip_rep.py # Geolocation + blocklist enrichment
│ │ ├── db_dump.py # Database export
│ │ ├── memory_cleanup.py # In-memory list trimming
│ │ └── top_attacking_ips.py # Top attacker caching
│ │
│ ├── tasks_master.py # Task discovery + APScheduler orchestrator
│ ├── firewall/ # Banlist export (iptables, raw)
│ ├── migrations/ # Schema migrations (auto-run)
│ │
│ └── templates/
│ ├── jinja2/
│ │ ├── base.html # Layout + CDN scripts
│ │ └── dashboard/
│ │ ├── index.html # Main dashboard page
│ │ └── partials/ # 13 HTMX fragment templates
│ ├── html/ # Deceptive trap page templates
│ └── static/
│ ├── css/dashboard.css
│ └── js/
│ ├── dashboard.js # Alpine.js app controller
│ ├── map.js # Leaflet map
│ ├── charts.js # Chart.js doughnut
│ └── radar.js # SVG radar chart
├── config.yaml # Application configuration
├── wordlists.json # Attack patterns + fake credentials
├── Dockerfile # Container build
├── docker-compose.yaml # Local orchestration
├── entrypoint.sh # Container startup (gosu privilege drop)
├── kubernetes/ # K8s manifests
└── helm/ # Helm chart
```
## Application Entry Point
`src/app.py` uses the **FastAPI application factory** pattern with an async lifespan manager:
```
Startup Shutdown
│ │
├─ Initialize logging └─ Log shutdown
├─ Initialize SQLite DB
├─ Create AccessTracker
├─ Load webpages file (optional)
├─ Store config + tracker in app.state
├─ Start APScheduler background tasks
└─ Log dashboard URL
```
## Request Pipeline
```
Request
┌──────────────────────┐
│ BanCheckMiddleware │──→ IP banned? → Return 500
└──────────┬───────────┘
┌──────────────────────┐
│ DeceptionMiddleware │──→ Attack detected? → Fake error response
└──────────┬───────────┘
┌───────────────────────┐
│ ServerHeaderMiddleware│──→ Add random Server header
└──────────┬────────────┘
┌───────────────────────┐
│ Route Matching │
│ (ordered by priority)│
│ │
│ 1. Static files │ /{secret}/static/*
│ 2. Dashboard router │ /{secret}/ (prefix-based)
│ 3. API router │ /{secret}/api/* (prefix-based)
│ 4. HTMX router │ /{secret}/htmx/* (prefix-based)
│ 5. Honeypot router │ /* (catch-all)
└───────────────────────┘
```
### Prefix-Based Routing
Dashboard, API, and HTMX routers are mounted with `prefix=f"/{secret}"` in `app.py`. This means:
- Route handlers define paths **without** the secret (e.g., `@router.get("/api/all-ips")`)
- FastAPI prepends the secret automatically (e.g., `GET /a1b2c3/api/all-ips`)
- The honeypot catch-all `/{path:path}` only matches paths that **don't** start with the secret
- No `_is_dashboard_path()` checks needed — the prefix handles access scoping
## Route Architecture
### Honeypot Routes (`routes/honeypot.py`)
| Method | Path | Response |
|--------|------|----------|
| `GET` | `/{path:path}` | Trap page with random links (catch-all) |
| `HEAD` | `/{path:path}` | 200 OK |
| `POST` | `/{path:path}` | Credential capture |
| `GET` | `/admin`, `/login` | Fake login form |
| `GET` | `/wp-admin`, `/wp-login.php` | Fake WordPress login |
| `GET` | `/phpmyadmin` | Fake phpMyAdmin |
| `GET` | `/robots.txt` | Honeypot paths advertised |
| `GET/POST` | `/api/search`, `/api/sql` | SQL injection honeypot |
| `POST` | `/api/contact` | XSS detection endpoint |
| `GET` | `/.env`, `/credentials.txt` | Fake sensitive files |
### Dashboard Routes (`routes/dashboard.py`)
| Method | Path | Response |
|--------|------|----------|
| `GET` | `/` | Server-rendered dashboard (Jinja2) |
### API Routes (`routes/api.py`)
| Method | Path | Response |
|--------|------|----------|
| `GET` | `/api/all-ips` | Paginated IP list with stats |
| `GET` | `/api/attackers` | Paginated attacker IPs |
| `GET` | `/api/ip-stats/{ip}` | Single IP detail |
| `GET` | `/api/credentials` | Captured credentials |
| `GET` | `/api/honeypot` | Honeypot trigger counts |
| `GET` | `/api/top-ips` | Top requesting IPs |
| `GET` | `/api/top-paths` | Most requested paths |
| `GET` | `/api/top-user-agents` | Top user agents |
| `GET` | `/api/attack-types-stats` | Attack type distribution |
| `GET` | `/api/attack-types` | Paginated attack log |
| `GET` | `/api/raw-request/{id}` | Full HTTP request |
| `GET` | `/api/get_banlist` | Export ban rules |
### HTMX Fragment Routes (`routes/htmx.py`)
Each returns a server-rendered Jinja2 partial (`hx-swap="innerHTML"`):
| Path | Template |
|------|----------|
| `/htmx/honeypot` | `honeypot_table.html` |
| `/htmx/top-ips` | `top_ips_table.html` |
| `/htmx/top-paths` | `top_paths_table.html` |
| `/htmx/top-ua` | `top_ua_table.html` |
| `/htmx/attackers` | `attackers_table.html` |
| `/htmx/credentials` | `credentials_table.html` |
| `/htmx/attacks` | `attack_types_table.html` |
| `/htmx/patterns` | `patterns_table.html` |
| `/htmx/ip-detail/{ip}` | `ip_detail.html` |
## Database Schema
```
┌─────────────────┐ ┌──────────────────┐
│ AccessLog │ │ AttackDetection │
├─────────────────┤ ├──────────────────┤
│ id (PK) │◄────│ access_log_id(FK)│
│ ip (indexed) │ │ attack_type │
│ path │ │ matched_pattern │
│ user_agent │ └──────────────────┘
│ method │
│ is_suspicious │ ┌──────────────────┐
│ is_honeypot │ │CredentialAttempt │
│ timestamp │ ├──────────────────┤
│ raw_request │ │ id (PK) │
└─────────────────┘ │ ip (indexed) │
│ path, username │
┌─────────────────┐ │ password │
│ IpStats │ │ timestamp │
├─────────────────┤ └──────────────────┘
│ ip (PK) │
│ total_requests │ ┌──────────────────┐
│ first/last_seen │ │ CategoryHistory │
│ country_code │ ├──────────────────┤
│ city, lat, lon │ │ id (PK) │
│ asn, asn_org │ │ ip (indexed) │
│ isp, reverse │ │ old_category │
│ is_proxy │ │ new_category │
│ is_hosting │ │ timestamp │
│ list_on (JSON) │ └──────────────────┘
│ category │
│ category_scores │
│ analyzed_metrics│
│ manual_category │
└─────────────────┘
```
**SQLite config:** WAL mode, 30s busy timeout, file permissions 600.
## Frontend Architecture
```
base.html
├── CDN: Leaflet, Chart.js, HTMX, Alpine.js (deferred)
├── Static: dashboard.css
└── dashboard/index.html (extends base)
├── Stats cards ──────────── Server-rendered on page load
├── Suspicious table ─────── Server-rendered on page load
├── Overview tab (Alpine.js x-show)
│ ├── Honeypot table ───── HTMX hx-get on load
│ ├── Top IPs table ────── HTMX hx-get on load
│ ├── Top Paths table ──── HTMX hx-get on load
│ ├── Top UA table ─────── HTMX hx-get on load
│ └── Credentials table ── HTMX hx-get on load
└── Attacks tab (Alpine.js x-show, lazy init)
├── Attackers table ──── HTMX hx-get on load
├── Map ──────────────── Leaflet (init on tab switch)
├── Chart ────────────── Chart.js (init on tab switch)
├── Attack types table ─ HTMX hx-get on load
└── Patterns table ───── HTMX hx-get on load
```
**Responsibility split:**
- **Alpine.js** — Tab state, modals, dropdowns, lazy initialization
- **HTMX** — Table pagination, sorting, IP detail expansion
- **Leaflet** — Interactive map with category-colored markers
- **Chart.js** — Doughnut chart for attack type distribution
- **Custom SVG** — Radar charts for IP category scores
## Background Tasks
Managed by `TasksMaster` (APScheduler). Tasks are auto-discovered from `src/tasks/`.
| Task | Schedule | Purpose |
|------|----------|---------|
| `analyze_ips` | Every 1 min | Score IPs into categories (attacker, crawler, user) |
| `fetch_ip_rep` | Every 5 min | Enrich IPs with geolocation + blocklist data |
| `db_dump` | Configurable | Export database backups |
| `memory_cleanup` | Periodic | Trim in-memory lists |
| `top_attacking_ips` | Periodic | Cache top attackers |
### IP Categorization Model
Each IP is scored across 4 categories based on:
- HTTP method distribution (risky methods ratio)
- Robots.txt violations
- Request timing anomalies (coefficient of variation)
- User-Agent diversity
- Attack URL detection
Categories: `attacker`, `bad_crawler`, `good_crawler`, `regular_user`, `unknown`
## Configuration
`config.yaml` with environment variable overrides (`KRAWL_{FIELD}`):
```yaml
server:
port: 5000
delay: 100 # Response delay (ms)
dashboard:
secret_path: "test" # Auto-generates if null
database:
path: "data/krawl.db"
retention_days: 30
crawl:
infinite_pages_for_malicious: true
max_pages_limit: 250
ban_duration_seconds: 600
behavior:
probability_error_codes: 0 # 0-100%
canary:
token_url: null # External canary alert URL
```
## Logging
Three rotating log files (1MB max, 5 backups each):
| Logger | File | Content |
|--------|------|---------|
| `krawl.app` | `logs/krawl.log` | Application events, errors |
| `krawl.access` | `logs/access.log` | HTTP access, attack detections |
| `krawl.credentials` | `logs/credentials.log` | Captured login attempts |
## Docker
```dockerfile
FROM python:3.11-slim
# Non-root user: krawl:1000
# Volumes: /app/logs, /app/data, /app/exports
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "5000", "--app-dir", "src"]
```
## Key Data Flows
### Honeypot Request
```
Client → BanCheck → DeceptionMiddleware → HoneypotRouter
┌─────────┴──────────┐
│ tracker.record() │
│ ├─ in-memory ++ │
│ ├─ detect attacks │
│ └─ DB persist │
└────────────────────┘
```
### Dashboard Load
```
Browser → GET /{secret}/ → SSR initial stats + Jinja2 render
→ Alpine.js init → HTMX fires hx-get for each table
→ User clicks Attacks tab → setTimeout → init Leaflet + Chart.js
→ Leaflet fetches /api/all-ips → plots markers
→ Chart.js fetches /api/attack-types-stats → renders doughnut
```
### IP Enrichment Pipeline
```
APScheduler (every 5 min)
└─ fetch_ip_rep.main()
├─ DB: get unenriched IPs (limit 50)
├─ ip-api.com → geolocation (country, city, ASN, coords)
├─ iprep.lcrawl.com → blocklist memberships
└─ DB: update IpStats with enriched data
```

10
docs/backups.md Normal file
View File

@@ -0,0 +1,10 @@
# Enable Database Dump Job for Backups
To enable the database dump job, set the following variables (*config file example*)
```yaml
backups:
path: "backups" # where backup will be saved
cron: "*/30 * * * *" # frequency of the cronjob
enabled: true
```

10
docs/canary-token.md Normal file
View File

@@ -0,0 +1,10 @@
# Customizing the Canary Token
To create a custom canary token, visit https://canarytokens.org
and generate a "Web bug" canary token.
This optional token is triggered when a crawler fully traverses the webpage until it reaches 0. At that point, a URL is returned. When this URL is requested, it sends an alert to the user via email, including the visitor's IP address and user agent.
To enable this feature, set the canary token URL [using the environment variable](../README.md#configuration-via-enviromental-variables) `KRAWL_CANARY_TOKEN_URL`.

21
docs/dashboard.md Normal file
View File

@@ -0,0 +1,21 @@
# Dashboard
Access the dashboard at `http://<server-ip>:<port>/<dashboard-path>`
The dashboard shows:
- Total and unique accesses
- Suspicious activity and attack detection
- Top IPs, paths, user-agents and GeoIP localization
- Real-time monitoring
The attackers' access to the honeypot endpoint and related suspicious activities (such as failed login attempts) are logged.
Krawl also implements a scoring system designed to distinguish between malicious and legitimate behavior on the website.
![dashboard-1](../img/dashboard-1.png)
The top IP Addresses is shown along with top paths and User Agents
![dashboard-2](../img/dashboard-2.png)
![dashboard-3](../img/dashboard-3.png)

View File

@@ -0,0 +1,50 @@
# Firewall exporters documentation
Firewall export feature is implemented trough a strategy pattern with an abstract class and a series of subclasses that implement the specific export logic for each firewall specific system:
```mermaid
classDiagram
class FWType{
+getBanlist()
}
FWType <|-- Raw
class Raw{ }
FWType <|-- Iptables
class Iptables{ }
note for Iptables "implements the getBanlist method for iptables rules"
```
Rule sets are generated trough the `top_attacking_ips__export-malicious-ips` that writes down the files in the `exports_path` configuration path. Files are named after the specific firewall that they implement as `[firewall]_banlist.txt` except for raw file that is called `malicious_ips.txt` to support legacy
## Adding firewalls exporters
To add a firewall exporter create a new python class in `src/firewall` that implements `FWType` class
> example with `Yourfirewall` class in the `yourfirewall.py` file
```python
from typing_extensions import override
from firewall.fwtype import FWType
class Yourfirewall(FWType):
@override
def getBanlist(self, ips) -> str:
"""
Generate raw list of bad IP addresses.
Args:
ips: List of IP addresses to ban
Returns:
String containing raw ips, one per line
"""
if not ips:
return ""
# Add here code implementation
```
Then add the following to the `src/server.py` and `src/tasks/top_attacking_ips.py`
```python
from firewall.yourfirewall import Yourfirewall
```

52
docs/honeypot.md Normal file
View File

@@ -0,0 +1,52 @@
# Honeypot
Below is a complete overview of the Krawl honeypot's capabilities
## robots.txt
The actual (juicy) robots.txt configuration [is the following](../src/templates/html/robots.txt).
## Honeypot pages
### Common Login Attempts
Requests to common admin endpoints (`/admin/`, `/wp-admin/`, `/phpMyAdmin/`) return a fake login page. Any login attempt triggers a 1-second delay to simulate real processing and is fully logged in the dashboard (credentials, IP, headers, timing).
![admin page](../img/admin-page.png)
### Common Misconfiguration Paths
Requests to paths like `/backup/`, `/config/`, `/database/`, `/private/`, or `/uploads/` return a fake directory listing populated with "interesting" files, each assigned a random file size to look realistic.
![directory-page](../img/directory-page.png)
### Environment File Leakage
The `.env` endpoint exposes fake database connection strings, **AWS API keys**, and **Stripe secrets**. It intentionally returns an error due to the `Content-Type` being `application/json` instead of plain text, mimicking a "juicy" misconfiguration that crawlers and scanners often flag as information leakage.
### Server Error Information
The `/server` page displays randomly generated fake error information for each known server.
![server and env page](../img/server-and-env-page.png)
### API Endpoints with Sensitive Data
The pages `/api/v1/users` and `/api/v2/secrets` show fake users and random secrets in JSON format
![users and secrets](../img/users-and-secrets.png)
### Exposed Credential Files
The pages `/credentials.txt` and `/passwords.txt` show fake users and random secrets
![credentials and passwords](../img/credentials-and-passwords.png)
### SQL Injection and XSS Detection
Pages such as `/users`, `/search`, `/contact`, `/info`, `/input`, and `/feedback`, along with APIs like `/api/sql` and `/api/database`, are designed to lure attackers into performing attacks such as **SQL injection** or **XSS**.
![sql injection](../img/sql_injection.png)
Automated tools like **SQLMap** will receive a different randomized database error on each request, increasing scan noise and confusing the attacker. All detected attacks are logged and displayed in the dashboard.
### Path Traversal Detection
Krawl detects and responds to **path traversal** attempts targeting common system files like `/etc/passwd`, `/etc/shadow`, or Windows system paths. When an attacker tries to access sensitive files using patterns like `../../../etc/passwd` or encoded variants (`%2e%2e/`, `%252e`), Krawl returns convincing fake file contents with realistic system users, UIDs, GIDs, and shell configurations. This wastes attacker time while logging the full attack pattern.
### XXE (XML External Entity) Injection
The `/api/xml` and `/api/parser` endpoints accept XML input and are designed to detect **XXE injection** attempts. When attackers try to exploit external entity declarations (`<!ENTITY`, `<!DOCTYPE`, `SYSTEM`) or reference entities to access local files, Krawl responds with realistic XML responses that appear to process the entities successfully. The honeypot returns fake file contents, simulated entity values (like `admin_credentials` or `database_connection`), or realistic error messages, making the attack appear successful while fully logging the payload.
### Command Injection Detection
Pages like `/api/exec`, `/api/run`, and `/api/system` simulate command execution endpoints vulnerable to **command injection**. When attackers attempt to inject shell commands using patterns like `; whoami`, `| cat /etc/passwd`, or backticks, Krawl responds with realistic command outputs. For example, `whoami` returns fake usernames like `www-data` or `nginx`, while `uname` returns fake Linux kernel versions. Network commands like `wget` or `curl` simulate downloads or return "command not found" errors, creating believable responses that delay and confuse automated exploitation tools.

25
docs/reverse-proxy.md Normal file
View File

@@ -0,0 +1,25 @@
# Example Usage Behind Reverse Proxy
You can configure a reverse proxy so all web requests land on the Krawl page by default, and hide your real content behind a secret hidden url. For example:
```bash
location / {
proxy_pass https://your-krawl-instance;
proxy_pass_header Server;
}
location /my-hidden-service {
proxy_pass https://my-hidden-service;
proxy_pass_header Server;
}
```
Alternatively, you can create a bunch of different "interesting" looking domains. For example:
- admin.example.com
- portal.example.com
- sso.example.com
- login.example.com
- ...
Additionally, you may configure your reverse proxy to forward all non-existing subdomains (e.g. nonexistent.example.com) to one of these domains so that any crawlers that are guessing domains at random will automatically end up at your Krawl instance.

22
docs/wordlist.md Normal file
View File

@@ -0,0 +1,22 @@
# Customizing the Wordlist
Edit `wordlists.json` to customize fake data for your use case
```json
{
"usernames": {
"prefixes": ["admin", "root", "user"],
"suffixes": ["_prod", "_dev", "123"]
},
"passwords": {
"prefixes": ["P@ssw0rd", "Admin"],
"simple": ["test", "password"]
},
"directory_listing": {
"files": ["credentials.txt", "backup.sql"],
"directories": ["admin/", "backup/"]
}
}
```
or **values.yaml** in the case of helm chart installation

View File

@@ -2,8 +2,8 @@ apiVersion: v2
name: krawl-chart
description: A Helm chart for Krawl honeypot server
type: application
version: 1.0.0
appVersion: 1.0.0
version: 1.1.0
appVersion: 1.1.0
keywords:
- honeypot
- security
@@ -13,4 +13,4 @@ maintainers:
home: https://github.com/blessedrebus/krawl
sources:
- https://github.com/blessedrebus/krawl
icon: https://raw.githubusercontent.com/blessedrebus/krawl/main/img/krawl-svg.svg
icon: https://raw.githubusercontent.com/blessedrebus/krawl/main/img/krawl-svg.svg

View File

@@ -10,103 +10,31 @@ A Helm chart for deploying the Krawl honeypot application on Kubernetes.
## Installation
### Helm Chart
Install with default values:
### From OCI Registry
```bash
helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \
--version 1.0.0 \
--namespace krawl-system \
--create-namespace
```
Or create a minimal `values.yaml` file:
```yaml
service:
type: LoadBalancer
port: 5000
timezone: "Europe/Rome"
ingress:
enabled: true
className: "traefik"
hosts:
- host: krawl.example.com
paths:
- path: /
pathType: Prefix
config:
server:
port: 5000
delay: 100
dashboard:
secret_path: null # Auto-generated if not set
database:
persistence:
enabled: true
size: 1Gi
```
Install with custom values:
```bash
helm install krawl oci://ghcr.io/blessedrebus/krawl-chart \
--version 0.2.2 \
--version 1.1.0 \
--namespace krawl-system \
--create-namespace \
-f values.yaml
-f values.yaml # optional
```
To access the deception server:
### From local chart
```bash
helm install krawl ./helm -n krawl-system --create-namespace -f values.yaml
```
A minimal [values.yaml](values-minimal.yaml) example is provided in this directory.
Once installed, get your service IP:
```bash
kubectl get svc krawl -n krawl-system
```
Once the EXTERNAL-IP is assigned, access your deception server at `http://<EXTERNAL-IP>:5000`
### Add the repository (if applicable)
```bash
helm repo add krawl https://github.com/BlessedRebuS/Krawl
helm repo update
```
### Install from OCI Registry
```bash
helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 0.2.1
```
Or with a specific namespace:
```bash
helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 0.2.1 -n krawl --create-namespace
```
### Install the chart locally
```bash
helm install krawl ./helm
```
### Install with custom values
```bash
helm install krawl ./helm -f values.yaml
```
### Install in a specific namespace
```bash
helm install krawl ./helm -n krawl --create-namespace
```
Then access the deception server at `http://<EXTERNAL-IP>:5000`
## Configuration
@@ -221,16 +149,6 @@ The following table lists the main configuration parameters of the Krawl chart a
| `resources.requests.cpu` | CPU request | `100m` |
| `resources.requests.memory` | Memory request | `64Mi` |
### Autoscaling
| Parameter | Description | Default |
|-----------|-------------|---------|
| `autoscaling.enabled` | Enable horizontal pod autoscaling | `false` |
| `autoscaling.minReplicas` | Minimum replicas | `1` |
| `autoscaling.maxReplicas` | Maximum replicas | `1` |
| `autoscaling.targetCPUUtilizationPercentage` | Target CPU utilization | `70` |
| `autoscaling.targetMemoryUtilizationPercentage` | Target memory utilization | `80` |
### Network Policy
| Parameter | Description | Default |
@@ -248,68 +166,24 @@ kubectl get secret krawl-server -n krawl-system \
## Usage Examples
### Basic Installation
You can override individual values with `--set` without a values file:
```bash
helm install krawl ./helm
```
### Installation with Custom Domain
```bash
helm install krawl ./helm \
--set ingress.hosts[0].host=honeypot.example.com
```
### Enable Canary Tokens
```bash
helm install krawl ./helm \
helm install krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.0 \
--set ingress.hosts[0].host=honeypot.example.com \
--set config.canary.token_url=https://canarytokens.com/your-token
```
### Configure Custom API Endpoint
```bash
helm install krawl ./helm \
--set config.api.server_url=https://api.example.com \
--set config.api.server_port=443
```
### Create Values Override File
Create `custom-values.yaml`:
```yaml
config:
server:
port: 8080
delay: 500
canary:
token_url: https://your-canary-token-url
dashboard:
secret_path: /super-secret-path
crawl:
max_pages_limit: 500
ban_duration_seconds: 3600
```
Then install:
```bash
helm install krawl ./helm -f custom-values.yaml
```
## Upgrading
```bash
helm upgrade krawl ./helm
helm upgrade krawl oci://ghcr.io/blessedrebus/krawl-chart --version 1.1.0 -f values.yaml
```
## Uninstalling
```bash
helm uninstall krawl
helm uninstall krawl -n krawl-system
```
## Troubleshooting
@@ -348,7 +222,6 @@ kubectl logs -l app.kubernetes.io/name=krawl
- `configmap.yaml` - Application configuration
- `pvc.yaml` - Persistent volume claim
- `ingress.yaml` - Ingress configuration
- `hpa.yaml` - Horizontal pod autoscaler
- `network-policy.yaml` - Network policies
## Support

View File

@@ -22,6 +22,14 @@ data:
token_tries: {{ .Values.config.canary.token_tries }}
dashboard:
secret_path: {{ .Values.config.dashboard.secret_path | toYaml }}
backups:
path: {{ .Values.config.backups.path | quote }}
cron: {{ .Values.config.backups.cron | quote }}
enabled: {{ .Values.config.backups.enabled }}
exports:
path: {{ .Values.config.exports.path | quote }}
logging:
level: {{ .Values.config.logging.level | quote }}
database:
path: {{ .Values.config.database.path | quote }}
retention_days: {{ .Values.config.database.retention_days }}

View File

@@ -5,9 +5,9 @@ metadata:
labels:
{{- include "krawl.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
strategy:
type: Recreate
selector:
matchLabels:
{{- include "krawl.selectorLabels" . | nindent 6 }}
@@ -29,7 +29,7 @@ spec:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: {{ .Chart.Name }}
- name: krawl
{{- with .Values.securityContext }}
securityContext:
{{- toYaml . | nindent 12 }}

View File

@@ -1,32 +0,0 @@
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: {{ include "krawl.fullname" . }}
labels:
{{- include "krawl.labels" . | nindent 4 }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ include "krawl.fullname" . }}
minReplicas: {{ .Values.autoscaling.minReplicas }}
maxReplicas: {{ .Values.autoscaling.maxReplicas }}
metrics:
{{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
{{- end }}
{{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
{{- end }}
{{- end }}

View File

@@ -3,7 +3,7 @@ replicaCount: 1
image:
repository: ghcr.io/blessedrebus/krawl
pullPolicy: Always
tag: "1.0.0"
tag: "1.1.0"
imagePullSecrets: []
nameOverride: "krawl"
@@ -54,13 +54,6 @@ resources:
# If not set, container will use its default timezone
timezone: ""
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 1
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
nodeSelector: {}
tolerations: []
@@ -84,6 +77,14 @@ config:
token_tries: 10
dashboard:
secret_path: null # Auto-generated if not set, or set to "/my-secret-dashboard"
backups:
path: "backups"
enabled: false
cron: "*/30 * * * *"
exports:
path: "exports"
logging:
level: "INFO" # DEBUG, INFO, WARNING, ERROR, CRITICAL
database:
path: "data/krawl.db"
retention_days: 30
@@ -307,6 +308,295 @@ wordlists:
- .git/
- keys/
- credentials/
fake_files:
- name: settings.conf
size_min: 1024
size_max: 8192
perms: "-rw-r--r--"
- name: database.sql
size_min: 10240
size_max: 102400
perms: "-rw-r--r--"
- name: .htaccess
size_min: 256
size_max: 1024
perms: "-rw-r--r--"
- name: README.md
size_min: 512
size_max: 2048
perms: "-rw-r--r--"
fake_directories:
- name: config
size: "4096"
perms: drwxr-xr-x
- name: backup
size: "4096"
perms: drwxr-xr-x
- name: logs
size: "4096"
perms: drwxrwxr-x
- name: data
size: "4096"
perms: drwxr-xr-x
fake_passwd:
system_users:
- "root:x:0:0:root:/root:/bin/bash"
- "daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin"
- "bin:x:2:2:bin:/bin:/usr/sbin/nologin"
- "sys:x:3:3:sys:/dev:/usr/sbin/nologin"
- "sync:x:4:65534:sync:/bin:/bin/sync"
- "www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin"
- "backup:x:34:34:backup:/var/backups:/usr/sbin/nologin"
- "mysql:x:108:113:MySQL Server,,,:/nonexistent:/bin/false"
- "sshd:x:109:65534::/run/sshd:/usr/sbin/nologin"
uid_min: 1000
uid_max: 2000
gid_min: 1000
gid_max: 2000
shells:
- /bin/bash
- /bin/sh
- /usr/bin/zsh
fake_shadow:
system_entries:
- "root:$6$rounds=656000$fake_salt_here$fake_hash_data:19000:0:99999:7:::"
- "daemon:*:19000:0:99999:7:::"
- "bin:*:19000:0:99999:7:::"
- "sys:*:19000:0:99999:7:::"
- "www-data:*:19000:0:99999:7:::"
hash_prefix: "$6$rounds=656000$"
salt_length: 16
hash_length: 86
xxe_responses:
file_access:
template: |
<?xml version="1.0"?>
<response>
<status>success</status>
<data>{content}</data>
</response>
entity_processed:
template: |
<?xml version="1.0"?>
<response>
<status>success</status>
<message>Entity processed successfully</message>
<entity_value>{entity_value}</entity_value>
</response>
entity_values:
- "admin_credentials"
- "database_connection"
- "api_secret_key"
- "internal_server_ip"
- "encrypted_password"
error:
template: |
<?xml version="1.0"?>
<response>
<status>error</status>
<message>{message}</message>
</response>
messages:
- "External entity not allowed"
- "XML parsing error"
- "Invalid entity reference"
default_content: "root:x:0:0:root:/root:/bin/bash\nwww-data:x:33:33:www-data:/var/www:/usr/sbin/nologin"
command_outputs:
id:
- "uid={uid}(www-data) gid={gid}(www-data) groups={gid}(www-data)"
- "uid={uid}(nginx) gid={gid}(nginx) groups={gid}(nginx)"
- "uid={uid}(apache) gid={gid}(apache) groups={gid}(apache)"
whoami:
- www-data
- nginx
- apache
- webapp
- nobody
uname:
- "Linux webserver 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"
- "Linux app-server 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 GNU/Linux"
- "Linux prod-server 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 GNU/Linux"
pwd:
- /var/www/html
- /home/webapp/public_html
- /usr/share/nginx/html
- /opt/app/public
ls:
- ["index.php", "config.php", "uploads", "assets", "README.md", ".htaccess", "admin"]
- ["app.js", "package.json", "node_modules", "public", "views", "routes"]
- ["index.html", "css", "js", "images", "data", "api"]
cat_config: |
<?php
// Configuration file
$db_host = 'localhost';
$db_user = 'webapp';
$db_pass = 'fake_password';
?>
network_commands:
- "bash: wget: command not found"
- "curl: (6) Could not resolve host: example.com"
- "Connection timeout"
- "bash: nc: command not found"
- "Downloaded {size} bytes"
generic:
- "sh: 1: syntax error: unexpected end of file"
- "Command executed successfully"
- ""
- "/bin/sh: {num}: not found"
- "bash: command not found"
uid_min: 1000
uid_max: 2000
gid_min: 1000
gid_max: 2000
download_size_min: 100
download_size_max: 10000
sql_errors:
mysql:
syntax_errors:
- "You have an error in your SQL syntax"
- "check the manual that corresponds to your MySQL server version"
table_errors:
- "Table '{table}' doesn't exist"
- "Unknown table '{table}'"
column_errors:
- "Unknown column '{column}' in 'field list'"
- "Unknown column '{column}' in 'where clause'"
postgresql:
syntax_errors:
- "ERROR: syntax error at or near"
- "ERROR: unterminated quoted string"
relation_errors:
- "ERROR: relation \"{table}\" does not exist"
column_errors:
- "ERROR: column \"{column}\" does not exist"
mssql:
syntax_errors:
- "Incorrect syntax near"
- "Unclosed quotation mark"
object_errors:
- "Invalid object name '{table}'"
column_errors:
- "Invalid column name '{column}'"
oracle:
syntax_errors:
- "ORA-00933: SQL command not properly ended"
- "ORA-00904: invalid identifier"
table_errors:
- "ORA-00942: table or view does not exist"
sqlite:
syntax_errors:
- "near \"{token}\": syntax error"
table_errors:
- "no such table: {table}"
column_errors:
- "no such column: {column}"
mongodb:
query_errors:
- "Failed to parse"
- "unknown operator"
collection_errors:
- "ns not found"
server_errors:
nginx:
versions:
- "1.18.0"
- "1.20.1"
- "1.22.0"
- "1.24.0"
template: |
<!DOCTYPE html>
<html>
<head>
<title>{code} {message}</title>
<style>
body {{
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}}
</style>
</head>
<body>
<h1>An error occurred.</h1>
<p>Sorry, the page you are looking for is currently unavailable.<br/>
Please try again later.</p>
<p>If you are the system administrator of this resource then you should check the error log for details.</p>
<p><em>Faithfully yours, nginx/{version}.</em></p>
</body>
</html>
apache:
versions:
- "2.4.41"
- "2.4.52"
- "2.4.54"
- "2.4.57"
os:
- Ubuntu
- Debian
- CentOS
template: |
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>{code} {message}</title>
</head><body>
<h1>{message}</h1>
<p>The requested URL was not found on this server.</p>
<hr>
<address>Apache/{version} ({os}) Server at {host} Port 80</address>
</body></html>
iis:
versions:
- "10.0"
- "8.5"
- "8.0"
template: |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<title>{code} - {message}</title>
</head>
<body>
<div id="header"><h1>Server Error</h1></div>
<div id="content">
<h2>{code} - {message}</h2>
<h3>The page cannot be displayed because an internal server error has occurred.</h3>
</div>
</body>
</html>
attack_patterns:
path_traversal: "(\\.\\.| %2e%2e|%252e|/etc/passwd|/etc/shadow|\\.\\.\\\\/|\\.\\./|/windows/system32|c:\\\\windows|/proc/self|\\.\\.\\.%2f|\\.\\.\\.%5c|etc/passwd|etc/shadow)"
sql_injection: "('|\"|`|--|#|/\\*|\\*/|\\bunion\\b|\\bunion\\s+select\\b|\\bor\\b.*=.*|\\band\\b.*=.*|'.*or.*'.*=.*'|\\bsleep\\b|\\bwaitfor\\b|\\bdelay\\b|\\bbenchmark\\b|;.*select|;.*drop|;.*insert|;.*update|;.*delete|\\bexec\\b|\\bexecute\\b|\\bxp_cmdshell\\b|information_schema|table_schema|table_name)"
xss_attempt: "(<script|</script|javascript:|onerror=|onload=|onclick=|onmouseover=|onfocus=|onblur=|<iframe|<img|<svg|<embed|<object|<body|<input|eval\\(|alert\\(|prompt\\(|confirm\\(|document\\.|window\\.|<style|expression\\(|vbscript:|data:text/html)"
lfi_rfi: "(file://|php://|expect://|data://|zip://|phar://|/etc/passwd|/etc/shadow|/proc/self|c:\\\\windows)"
xxe_injection: "(<!ENTITY|<!DOCTYPE|SYSTEM\\s+[\"']|PUBLIC\\s+[\"']|&\\w+;|file://|php://filter|expect://)"
ldap_injection: "(\\*\\)|\\(\\||\\(&)"
command_injection: "(cmd=|exec=|command=|execute=|system=|ping=|host=|&&|\\|\\||;|\\$\\{|\\$\\(|`|\\bid\\b|\\bwhoami\\b|\\buname\\b|\\bcat\\b|\\bls\\b|\\bpwd\\b|\\becho\\b|\\bwget\\b|\\bcurl\\b|\\bnc\\b|\\bnetcat\\b|\\bbash\\b|\\bsh\\b|\\bps\\b|\\bkill\\b|\\bchmod\\b|\\bchown\\b|\\bcp\\b|\\bmv\\b|\\brm\\b|/bin/bash|/bin/sh|cmd\\.exe|/bin/|/usr/bin/|/sbin/)"
common_probes: "(/admin|/wp-admin|/phpMyAdmin|/phpmyadmin|/feedback|\\.env|/credentials\\.txt|/passwords\\.txt|\\.git|/backup\\.sql|/db_backup\\.sql)"
suspicious_patterns:
- sqlmap
- nessus
- burp
- zap
- metasploit
- nuclei
- gobuster
- dirbuster
credential_fields:
username_fields:
- username
- user
- login
- email
- log
- userid
- account
password_fields:
- password
- pass
- passwd
- pwd
- passphrase
server_headers:
- Apache/2.2.22 (Ubuntu)
- nginx/1.18.0

BIN
img/attack_types.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 97 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 179 KiB

After

Width:  |  Height:  |  Size: 353 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

View File

@@ -68,6 +68,14 @@ data:
token_tries: 10
dashboard:
secret_path: null
backups:
path: "backups"
cron: "*/30 * * * *"
enabled: false
exports:
path: "exports"
logging:
level: "INFO"
database:
path: "data/krawl.db"
retention_days: 30
@@ -154,6 +162,8 @@ metadata:
app.kubernetes.io/version: "1.0.0"
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app.kubernetes.io/name: krawl

View File

@@ -26,6 +26,14 @@ data:
token_tries: 10
dashboard:
secret_path: null
backups:
path: "backups"
cron: "*/30 * * * *"
enabled: false
exports:
path: "exports"
logging:
level: "INFO"
database:
path: "data/krawl.db"
retention_days: 30

View File

@@ -10,6 +10,8 @@ metadata:
app.kubernetes.io/version: "1.0.0"
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app.kubernetes.io/name: krawl

View File

@@ -10,6 +10,5 @@ resources:
- service.yaml
- network-policy.yaml
- ingress.yaml
- hpa.yaml
namespace: krawl-system

155
plugins/fail2ban/README.md Normal file
View File

@@ -0,0 +1,155 @@
# Fail2Ban + Krawl Integration
## Overview
This guide explains how to integrate **[fail2ban](https://github.com/fail2ban/fail2ban)** with Krawl to automatically block detected malicious IPs at the firewall level using iptables. Fail2ban monitors Krawl's malicious IP export and applies real-time IP bans.
## Architecture
```
Krawl detects malicious IPs
Writes to malicious_ips.txt
Fail2ban monitors the file
Filter matches IPs using regex
Iptables firewall blocks the IP
Auto-unban after bantime expires
```
## Prerequisites
- Linux system with iptables
- Fail2ban installed: `sudo apt-get install fail2ban`
- Krawl running and generating malicious IPs
- Root/sudo access
## Installation & Setup
### 1. Create the Filter Configuration [krawl-filter.conf](krawl-filter.conf)
Create `/etc/fail2ban/filter.d/krawl-filter.conf`:
```ini
[Definition]
failregex = ^<HOST>$
```
**Explanation:** The filter matches any line that contains only an IP address (`<HOST>` is fail2ban's placeholder for IP addresses). In this case, we use **one IP per row** as a result of the Krawl detection engine for attackers.
### 2. Create the Jail Configuration [krawl-jail.conf](krawl-jail.conf)
### 2.1 Krawl is on the same host
Create `/etc/fail2ban/jail.d/krawl-jail.conf` and replace the `logpath` with the path to the krawl `malicious_ips.txt`:
```ini
[krawl]
enabled = true
filter = krawl
logpath = /path/to/malicious_ips.txt
backend = auto
maxretry = 1
findtime = 1
bantime = 2592000
action = iptables-allports[name=krawl-ban, port=all, protocol=all]
```
### 2.2 Krawl is on a different host
If Krawl is deployed on another instance, you can use the Krawl API to get malicious IPs via a **curl** command scheduled with **cron**.
```bash
curl http://your-krawl-instance/dashboard-path/api/get_banlist?fwtype=raw -o malicious_ips.txt
```
#### Cron Setup
Edit your crontab to refresh the malicious IPs list:
```bash
sudo crontab -e
```
Add this single cron job to fetch malicious IPs every hour:
```bash
0 * * * * curl http://your-krawl-instance/dashboard-path/api/get_banlist?fwtype=raw -o /tmp/malicious_ips.txt
```
Replace the `krawl-jail.conf` **logpath** with `/tmp/malicious_ips.txt`.
### 3. Reload Fail2Ban
```bash
sudo systemctl restart fail2ban
```
Verify the jail is active:
```bash
sudo fail2ban-client status krawl
```
## How It Works
### When an IP is Added to malicious_ips.txt
1. **Fail2ban detects the new line** in the log file (via inotify)
2. **Filter regex matches** the IP address pattern
3. **maxretry check:** Since maxretry=1, ban immediately
4. **Action triggered:** `iptables-allports` adds a firewall block rule
5. **IP is blocked** on all ports and protocols
### When the 30-Day Rotation Occurs
Your malicious IPs file is rotated every 30 days. With `bantime = 2592000` (30 days):
If you used `bantime = -1` (permanent), old IPs would remain banned forever even after removal from the file. This option is not recommended because external IPs can rotate and are unlikely to be static.
## Monitoring
### Check Currently Banned IPs
```bash
sudo fail2ban-client status krawl
```
### View Fail2Ban Logs
```bash
sudo tail -f /var/log/fail2ban.log | grep krawl
```
## Management Commands
### Manually Ban an IP
```bash
sudo fail2ban-client set krawl banip 192.168.1.100
```
### Manually Unban an IP
```bash
sudo fail2ban-client set krawl unbanip 192.168.1.100
```
### Clear All Bans in Krawl Jail
```bash
sudo fail2ban-client set krawl unbanall
```
### Restart the Krawl Jail Only
```bash
sudo fail2ban-client restart krawl
```
## References
- [Fail2Ban Documentation](https://www.fail2ban.org/wiki/index.php/Main_Page)
- [Fail2Ban Configuration Manual](https://www.fail2ban.org/wiki/index.php/Jail.conf)
- [Iptables Basics](https://www.digitalocean.com/community/tutorials/iptables-essentials-common-firewall-rules-and-commands)

View File

@@ -0,0 +1,2 @@
[Definition]
failregex = ^<HOST>$

View File

@@ -0,0 +1,9 @@
[krawl]
enabled = true
filter = krawl
logpath = /path/to/malicious_ips.txt ; update this path to where your krawl malicious IPs are logged
backend = auto
maxretry = 1
findtime = 1
bantime = 2592000 ; 30 days
action = iptables-allports[name=krawl-ban, port=all, protocol=all]

302
plugins/iptables/README.md Normal file
View File

@@ -0,0 +1,302 @@
# Iptables + Krawl Integration
## Overview
This guide explains how to integrate **iptables** with Krawl to automatically block detected malicious IPs at the firewall level. The iptables integration fetches the malicious IP list directly from Krawl's API and applies firewall rules.
## Architecture
```
Krawl detects malicious IPs
Stores in database
API endpoint
Cron job fetches list
Iptables firewall blocks IPs
All traffic from banned IPs dropped
```
## Prerequisites
- Linux system with iptables installed (typically pre-installed)
- Krawl running with API accessible
- Root/sudo access
- Curl or wget for HTTP requests
- Cron for scheduling (or systemd timer as alternative)
## Installation & Setup
### 1. Create the [krawl-iptables.sh](krawl-iptables.sh) script
```bash
#!/bin/bash
curl -s https://your-krawl-instance/your-dashboard-path/api/get_banlist?fwtype=iptables | while read ip; do
iptables -C INPUT -s "$ip" -j DROP || iptables -A INPUT -s "$ip" -j DROP;
done
```
Make it executable:
```bash
sudo chmod +x ./krawl-iptables.sh
```
### 2. Test the Script
```bash
sudo ./krawl-iptables.sh
```
### 3. Schedule with Cron
Edit root crontab:
```bash
sudo crontab -e
```
Add this line to sync IPs every hour:
```bash
0 * * * * /path/to/krawl-iptables.sh
```
## How It Works
### When the Script Runs
1. **Fetch IPs** from Krawl API (`/api/get_banlist?fwtype=iptables`)
2. **Add new DROP rules** for each IP
3. **Rules are applied immediately** at kernel level
## Monitoring
### Check Active Rules
View all KRAWL-BAN rules:
```bash
sudo iptables -L KRAWL-BAN -n
```
Count blocked IPs:
```bash
sudo iptables -L KRAWL-BAN -n | tail -n +3 | wc -l
```
### Check Script Logs
```bash
sudo tail -f /var/log/krawl-iptables-sync.log
```
### Monitor in Real-Time
Watch dropped packets (requires kernel logging):
```bash
sudo tail -f /var/log/syslog | grep "IN=.*OUT="
```
## Management Commands
### Manually Block an IP
```bash
sudo iptables -A KRAWL-BAN -s 192.168.1.100 -j DROP
```
### Manually Unblock an IP
```bash
sudo iptables -D KRAWL-BAN -s 192.168.1.100 -j DROP
```
### List All Blocked IPs
```bash
sudo iptables -L KRAWL-BAN -n | grep DROP
```
### Clear All Rules
```bash
sudo iptables -F KRAWL-BAN
```
### Disable the Chain (Temporarily)
```bash
sudo iptables -D INPUT -j KRAWL-BAN
```
### Re-enable the Chain
```bash
sudo iptables -I INPUT -j KRAWL-BAN
```
### View Statistics
```bash
sudo iptables -L KRAWL-BAN -n -v
```
## Persistent Rules (Survive Reboot)
### Save Current Rules
```bash
sudo iptables-save > /etc/iptables/rules.v4
```
### Restore on Boot
Install iptables-persistent:
```bash
sudo apt-get install iptables-persistent
```
During installation, choose "Yes" to save current IPv4 and IPv6 rules.
To update later:
```bash
sudo iptables-save > /etc/iptables/rules.v4
sudo systemctl restart iptables
```
## Performance Considerations
### For Your Setup
- **Minimal overhead** — Iptables rules are processed at kernel level (very fast)
- **No logging I/O** — Blocked IPs are dropped before application sees them
- **Scales to thousands** — Iptables can efficiently handle 10,000+ rules
### Optimization Tips
1. **Use a custom chain** — Isolates Krawl rules from other firewall rules
2. **Schedule appropriately** — Every hour is usually sufficient; adjust based on threat level
3. **Monitor rule count** — Check periodically to ensure the script is working
4. **Consider IPSET** — For 10,000+ IPs, use ipset instead (more efficient)
### Using IPSET (Advanced)
For large-scale deployments, ipset is more efficient than individual iptables rules:
```bash
# Create ipset
sudo ipset create krawl-ban hash:ip
# Add IPs to ipset
while read ip; do
sudo ipset add krawl-ban "$ip"
done
# Single iptables rule references the ipset
sudo iptables -I INPUT -m set --match-set krawl-ban src -j DROP
```
## Troubleshooting
### Script Says "Failed to fetch IP list"
Check API connectivity:
```bash
curl http://your-krawl-instance/api/get_banlist?fwtype=iptables
```
Verify:
- Krawl is running
- API URL is correct
- Firewall allows outbound HTTPS/HTTP
- No authentication required
### Iptables Rules Not Persisting After Reboot
Install and configure iptables-persistent:
```bash
sudo apt-get install iptables-persistent
sudo iptables-save > /etc/iptables/rules.v4
```
### Script Runs but No Rules Added
Check if chain exists:
```bash
sudo iptables -L KRAWL-BAN -n 2>&1 | head -1
```
Check logs for errors:
```bash
sudo grep ERROR /var/log/krawl-iptables-sync.log
```
Verify IP format in Krawl API response:
```bash
curl http://your-krawl-instance/api/get_banlist?fwtype=iptables | head -10
```
### Blocked Legitimate Traffic
Check what IPs are blocked:
```bash
sudo iptables -L KRAWL-BAN -n | grep -E [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}
```
Unblock an IP:
```bash
sudo iptables -D KRAWL-BAN -s 203.0.113.50 -j DROP
```
Report false positive to Krawl administrators.
## Security Best Practices
1. **Limit API access** — Restrict `/api/get_banlist` to trusted networks (if internal use)
2. **Use HTTPS** — Fetch from HTTPS endpoint if available
3. **Verify TLS certificates** — Add `-k` only if necessary, not by default
4. **Rate limit cron jobs** — Don't run too frequently to avoid DoS
5. **Monitor sync logs** — Alert on repeated failures
6. **Backup rules** — Periodically backup `/etc/iptables/rules.v4`
## Integration with Krawl Workflow
Combined with fail2ban and iptables:
```
Real-time events (fail2ban)
Immediate IP bans (temporary)
Hourly sync (iptables cron)
Permanent block until next rotation
30-day cleanup cycle
```
## Manual Integration Example
Instead of cron, manually fetch and block:
```bash
# Fetch malicious IPs
curl -s http://your-krawl-instance/api/get_banlist?fwtype=iptables > /tmp/malicious_ips.txt
# Read and block each IP
while read ip; do
sudo iptables -A KRAWL-BAN -s "$ip" -j DROP
done < /tmp/malicious_ips.txt
# Save rules
sudo iptables-save > /etc/iptables/rules.v4
```
## References
- [Iptables Man Page](https://linux.die.net/man/8/iptables)
- [Iptables Essentials](https://www.digitalocean.com/community/tutorials/iptables-essentials-common-firewall-rules-and-commands)
- [Ipset Documentation](https://ipset.netfilter.org/)
- [Linux Firewall Administration Guide](https://www.kernel.org/doc/html/latest/networking/nf_conntrack-sysctl.html)

View File

@@ -0,0 +1,6 @@
#!/bin/bash
# This script fetches a list of malicious IPs from your Krawl instance and adds them to the iptables firewall to block incoming traffic from those IPs.
curl -s https://your-krawl-instance/api/get_banlist?fwtype=iptables | while read ip; do
iptables -C INPUT -s "$ip" -j DROP || iptables -A INPUT -s "$ip" -j DROP;
done

161
plugins/nftables/README.md Normal file
View File

@@ -0,0 +1,161 @@
# Nftables + Krawl Integration
## Overview
This guide explains how to integrate **nftables** with Krawl to automatically block detected malicious IPs at the firewall level. Nftables is the modern replacement for iptables on newer Linux systems and provides more efficient IP set-based blocking.
## Architecture
```
Krawl detects malicious IPs
Stores in database
API endpoint
Cron job fetches list
Nftables firewall blocks IPs
All traffic from banned IPs dropped
```
## Prerequisites
- Modern Linux system with nftables installed (Ubuntu 22+, Debian 12+, RHEL 9+)
- Krawl running with API accessible
- Root/sudo access
- Curl for HTTP requests
- Cron for scheduling
## When to Use Nftables
Check if your system uses nftables:
```bash
sudo nft list tables
```
If this shows tables, you're using nftables. If you get command not found, use iptables instead.
## Installation & Setup
### 1. Create the [krawl-nftables.sh](krawl-nftables.sh) script
```bash
#!/bin/bash
curl -s https://your-krawl-instance/your-dashboard-path/api/get_banlist?fwtype=iptables > /tmp/ips_to_ban.txt
sudo nft add set inet filter krawl_ban { type ipv4_addr \; } 2>/dev/null || true
while read -r ip; do
[[ -z "$ip" ]] && continue
sudo nft add element inet filter krawl_ban { "$ip" }
done < /tmp/ips_to_ban.txt
sudo nft add rule inet filter input ip saddr @krawl_ban counter drop 2>/dev/null || true
rm -f /tmp/ips_to_ban.txt
```
Make it executable:
```bash
sudo chmod +x ./krawl-nftables.sh
```
### 2. Test the Script
```bash
sudo ./krawl-nftables.sh
```
### 3. Schedule with Cron
Edit root crontab:
```bash
sudo crontab -e
```
Add this line to sync IPs every hour:
```bash
0 * * * * /path/to/krawl-nftables.sh
```
## Monitoring
### Check the Blocking Set
View blocked IPs:
```bash
sudo nft list set inet filter krawl_ban
```
Count blocked IPs:
```bash
sudo nft list set inet filter krawl_ban | grep "elements" | wc -w
```
### Check Active Rules
View all rules in the filter table:
```bash
sudo nft list table inet filter
```
Find Krawl-specific rules:
```bash
sudo nft list chain inet filter input | grep krawl_ban
```
### Monitor in Real-Time
Watch packets being dropped:
```bash
sudo nft list set inet filter krawl_ban -a
```
## Management Commands
### Manually Block an IP
```bash
sudo nft add element inet filter krawl_ban { 192.168.1.100 }
```
### Manually Unblock an IP
```bash
sudo nft delete element inet filter krawl_ban { 192.168.1.100 }
```
### List All Blocked IPs
```bash
sudo nft list set inet filter krawl_ban
```
### Clear All Blocked IPs
```bash
sudo nft flush set inet filter krawl_ban
```
### Delete the Rule
```bash
sudo nft delete rule inet filter input handle <handle>
```
## References
- [Nftables Official Documentation](https://wiki.nftables.org/)
- [Nftables Quick Reference](https://wiki.nftables.org/wiki-nftables/index.php/Quick_reference-nftables_in_10_minutes)
- [Linux Kernel Netfilter Guide](https://www.kernel.org/doc/html/latest/networking/netfilter/)
- [Nftables Man Page](https://man.archlinux.org/man/nft.8)

View File

@@ -0,0 +1,19 @@
#!/bin/bash
# Fetch malicious IPs to temporary file
curl -s https://your-krawl-instance/your-dashboard-path/api/get_banlist?fwtype=iptables > /tmp/ips_to_ban.txt
# Create the set if it doesn't exist
sudo nft add set inet filter krawl_ban { type ipv4_addr \; } 2>/dev/null || true
# Add IPs to the set
while read -r ip; do
[[ -z "$ip" ]] && continue
sudo nft add element inet filter krawl_ban { "$ip" }
done < /tmp/ips_to_ban.txt
# Create the rule if it doesn't exist
sudo nft add rule inet filter input ip saddr @krawl_ban counter drop 2>/dev/null || true
# Cleanup
rm -f /tmp/ips_to_ban.txt

View File

@@ -11,3 +11,9 @@ SQLAlchemy>=2.0.0,<3.0.0
APScheduler>=3.11.2
requests>=2.32.5
# Web framework
fastapi>=0.115.0
uvicorn[standard]>=0.30.0
jinja2>=3.1.5
python-multipart>=0.0.9

View File

@@ -1,342 +0,0 @@
#!/usr/bin/env python3
from sqlalchemy import select
from typing import Optional
from database import get_database, DatabaseManager
from zoneinfo import ZoneInfo
from pathlib import Path
from datetime import datetime, timedelta
import re
import urllib.parse
from wordlists import get_wordlists
from config import get_config
from logger import get_app_logger
import requests
"""
Functions for user activity analysis
"""
app_logger = get_app_logger()
class Analyzer:
"""
Analyzes users activity and produces aggregated insights
"""
def __init__(self, db_manager: Optional[DatabaseManager] = None):
"""
Initialize the analyzer.
Args:
db_manager: Optional DatabaseManager for persistence.
If None, will use the global singleton.
"""
self._db_manager = db_manager
@property
def db(self) -> Optional[DatabaseManager]:
"""
Get the database manager, lazily initializing if needed.
Returns:
DatabaseManager instance or None if not available
"""
if self._db_manager is None:
try:
self._db_manager = get_database()
except Exception:
pass
return self._db_manager
# def infer_user_category(self, ip: str) -> str:
# config = get_config()
# http_risky_methods_threshold = config.http_risky_methods_threshold
# violated_robots_threshold = config.violated_robots_threshold
# uneven_request_timing_threshold = config.uneven_request_timing_threshold
# user_agents_used_threshold = config.user_agents_used_threshold
# attack_urls_threshold = config.attack_urls_threshold
# uneven_request_timing_time_window_seconds = config.uneven_request_timing_time_window_seconds
# app_logger.debug(f"http_risky_methods_threshold: {http_risky_methods_threshold}")
# score = {}
# score["attacker"] = {"risky_http_methods": False, "robots_violations": False, "uneven_request_timing": False, "different_user_agents": False, "attack_url": False}
# score["good_crawler"] = {"risky_http_methods": False, "robots_violations": False, "uneven_request_timing": False, "different_user_agents": False, "attack_url": False}
# score["bad_crawler"] = {"risky_http_methods": False, "robots_violations": False, "uneven_request_timing": False, "different_user_agents": False, "attack_url": False}
# score["regular_user"] = {"risky_http_methods": False, "robots_violations": False, "uneven_request_timing": False, "different_user_agents": False, "attack_url": False}
# #1-3 low, 4-6 mid, 7-9 high, 10-20 extreme
# weights = {
# "attacker": {
# "risky_http_methods": 6,
# "robots_violations": 4,
# "uneven_request_timing": 3,
# "different_user_agents": 8,
# "attack_url": 15
# },
# "good_crawler": {
# "risky_http_methods": 1,
# "robots_violations": 0,
# "uneven_request_timing": 0,
# "different_user_agents": 0,
# "attack_url": 0
# },
# "bad_crawler": {
# "risky_http_methods": 2,
# "robots_violations": 7,
# "uneven_request_timing": 0,
# "different_user_agents": 5,
# "attack_url": 5
# },
# "regular_user": {
# "risky_http_methods": 0,
# "robots_violations": 0,
# "uneven_request_timing": 8,
# "different_user_agents": 3,
# "attack_url": 0
# }
# }
# accesses = self.db.get_access_logs(ip_filter = ip, limit=1000)
# total_accesses_count = len(accesses)
# if total_accesses_count <= 0:
# return
# # Set category as "unknown" for the first 5 requests
# if total_accesses_count < 3:
# category = "unknown"
# analyzed_metrics = {}
# category_scores = {"attacker": 0, "good_crawler": 0, "bad_crawler": 0, "regular_user": 0, "unknown": 0}
# last_analysis = datetime.now(tz=ZoneInfo('UTC'))
# self._db_manager.update_ip_stats_analysis(ip, analyzed_metrics, category, category_scores, last_analysis)
# return 0
# #--------------------- HTTP Methods ---------------------
# get_accesses_count = len([item for item in accesses if item["method"] == "GET"])
# post_accesses_count = len([item for item in accesses if item["method"] == "POST"])
# put_accesses_count = len([item for item in accesses if item["method"] == "PUT"])
# delete_accesses_count = len([item for item in accesses if item["method"] == "DELETE"])
# head_accesses_count = len([item for item in accesses if item["method"] == "HEAD"])
# options_accesses_count = len([item for item in accesses if item["method"] == "OPTIONS"])
# patch_accesses_count = len([item for item in accesses if item["method"] == "PATCH"])
# if total_accesses_count > http_risky_methods_threshold:
# http_method_attacker_score = (post_accesses_count + put_accesses_count + delete_accesses_count + options_accesses_count + patch_accesses_count) / total_accesses_count
# else:
# http_method_attacker_score = 0
# #print(f"HTTP Method attacker score: {http_method_attacker_score}")
# if http_method_attacker_score >= http_risky_methods_threshold:
# score["attacker"]["risky_http_methods"] = True
# score["good_crawler"]["risky_http_methods"] = False
# score["bad_crawler"]["risky_http_methods"] = True
# score["regular_user"]["risky_http_methods"] = False
# else:
# score["attacker"]["risky_http_methods"] = False
# score["good_crawler"]["risky_http_methods"] = True
# score["bad_crawler"]["risky_http_methods"] = False
# score["regular_user"]["risky_http_methods"] = False
# #--------------------- Robots Violations ---------------------
# #respect robots.txt and login/config pages access frequency
# robots_disallows = []
# robots_path = Path(__file__).parent / "templates" / "html" / "robots.txt"
# with open(robots_path, "r") as f:
# for line in f:
# line = line.strip()
# if not line:
# continue
# parts = line.split(":")
# if parts[0] == "Disallow":
# parts[1] = parts[1].rstrip("/")
# #print(f"DISALLOW {parts[1]}")
# robots_disallows.append(parts[1].strip())
# #if 0 100% sure is good crawler, if >10% of robots violated is bad crawler or attacker
# violated_robots_count = len([item for item in accesses if any(item["path"].rstrip("/").startswith(disallow) for disallow in robots_disallows)])
# #print(f"Violated robots count: {violated_robots_count}")
# if total_accesses_count > 0:
# violated_robots_ratio = violated_robots_count / total_accesses_count
# else:
# violated_robots_ratio = 0
# if violated_robots_ratio >= violated_robots_threshold:
# score["attacker"]["robots_violations"] = True
# score["good_crawler"]["robots_violations"] = False
# score["bad_crawler"]["robots_violations"] = True
# score["regular_user"]["robots_violations"] = False
# else:
# score["attacker"]["robots_violations"] = False
# score["good_crawler"]["robots_violations"] = False
# score["bad_crawler"]["robots_violations"] = False
# score["regular_user"]["robots_violations"] = False
# #--------------------- Requests Timing ---------------------
# #Request rate and timing: steady, throttled, polite vs attackers' bursty, aggressive, or oddly rhythmic behavior
# timestamps = [datetime.fromisoformat(item["timestamp"]) for item in accesses]
# now_utc = datetime.now(tz=ZoneInfo('UTC'))
# timestamps = [ts for ts in timestamps if now_utc - ts <= timedelta(seconds=uneven_request_timing_time_window_seconds)]
# timestamps = sorted(timestamps, reverse=True)
# time_diffs = []
# for i in range(0, len(timestamps)-1):
# diff = (timestamps[i] - timestamps[i+1]).total_seconds()
# time_diffs.append(diff)
# mean = 0
# variance = 0
# std = 0
# cv = 0
# if time_diffs:
# mean = sum(time_diffs) / len(time_diffs)
# variance = sum((x - mean) ** 2 for x in time_diffs) / len(time_diffs)
# std = variance ** 0.5
# cv = std/mean
# app_logger.debug(f"Mean: {mean} - Variance {variance} - Standard Deviation {std} - Coefficient of Variation: {cv}")
# if cv >= uneven_request_timing_threshold:
# score["attacker"]["uneven_request_timing"] = True
# score["good_crawler"]["uneven_request_timing"] = False
# score["bad_crawler"]["uneven_request_timing"] = False
# score["regular_user"]["uneven_request_timing"] = True
# else:
# score["attacker"]["uneven_request_timing"] = False
# score["good_crawler"]["uneven_request_timing"] = False
# score["bad_crawler"]["uneven_request_timing"] = False
# score["regular_user"]["uneven_request_timing"] = False
# #--------------------- Different User Agents ---------------------
# #Header Quality and Consistency: Crawlers tend to use complete and consistent headers, attackers might miss, fake, or change headers
# user_agents_used = [item["user_agent"] for item in accesses]
# user_agents_used = list(dict.fromkeys(user_agents_used))
# #print(f"User agents used: {user_agents_used}")
# if len(user_agents_used) >= user_agents_used_threshold:
# score["attacker"]["different_user_agents"] = True
# score["good_crawler"]["different_user_agents"] = False
# score["bad_crawler"]["different_user_agentss"] = True
# score["regular_user"]["different_user_agents"] = False
# else:
# score["attacker"]["different_user_agents"] = False
# score["good_crawler"]["different_user_agents"] = False
# score["bad_crawler"]["different_user_agents"] = False
# score["regular_user"]["different_user_agents"] = False
# #--------------------- Attack URLs ---------------------
# attack_urls_found_list = []
# wl = get_wordlists()
# if wl.attack_patterns:
# queried_paths = [item["path"] for item in accesses]
# for queried_path in queried_paths:
# # URL decode the path to catch encoded attacks
# try:
# decoded_path = urllib.parse.unquote(queried_path)
# # Double decode to catch double-encoded attacks
# decoded_path_twice = urllib.parse.unquote(decoded_path)
# except Exception:
# decoded_path = queried_path
# decoded_path_twice = queried_path
# for name, pattern in wl.attack_patterns.items():
# # Check original, decoded, and double-decoded paths
# if (re.search(pattern, queried_path, re.IGNORECASE) or
# re.search(pattern, decoded_path, re.IGNORECASE) or
# re.search(pattern, decoded_path_twice, re.IGNORECASE)):
# attack_urls_found_list.append(f"{name}: {pattern}")
# #remove duplicates
# attack_urls_found_list = set(attack_urls_found_list)
# attack_urls_found_list = list(attack_urls_found_list)
# if len(attack_urls_found_list) > attack_urls_threshold:
# score["attacker"]["attack_url"] = True
# score["good_crawler"]["attack_url"] = False
# score["bad_crawler"]["attack_url"] = False
# score["regular_user"]["attack_url"] = False
# else:
# score["attacker"]["attack_url"] = False
# score["good_crawler"]["attack_url"] = False
# score["bad_crawler"]["attack_url"] = False
# score["regular_user"]["attack_url"] = False
# #--------------------- Calculate score ---------------------
# attacker_score = good_crawler_score = bad_crawler_score = regular_user_score = 0
# attacker_score = score["attacker"]["risky_http_methods"] * weights["attacker"]["risky_http_methods"]
# attacker_score = attacker_score + score["attacker"]["robots_violations"] * weights["attacker"]["robots_violations"]
# attacker_score = attacker_score + score["attacker"]["uneven_request_timing"] * weights["attacker"]["uneven_request_timing"]
# attacker_score = attacker_score + score["attacker"]["different_user_agents"] * weights["attacker"]["different_user_agents"]
# attacker_score = attacker_score + score["attacker"]["attack_url"] * weights["attacker"]["attack_url"]
# good_crawler_score = score["good_crawler"]["risky_http_methods"] * weights["good_crawler"]["risky_http_methods"]
# good_crawler_score = good_crawler_score + score["good_crawler"]["robots_violations"] * weights["good_crawler"]["robots_violations"]
# good_crawler_score = good_crawler_score + score["good_crawler"]["uneven_request_timing"] * weights["good_crawler"]["uneven_request_timing"]
# good_crawler_score = good_crawler_score + score["good_crawler"]["different_user_agents"] * weights["good_crawler"]["different_user_agents"]
# good_crawler_score = good_crawler_score + score["good_crawler"]["attack_url"] * weights["good_crawler"]["attack_url"]
# bad_crawler_score = score["bad_crawler"]["risky_http_methods"] * weights["bad_crawler"]["risky_http_methods"]
# bad_crawler_score = bad_crawler_score + score["bad_crawler"]["robots_violations"] * weights["bad_crawler"]["robots_violations"]
# bad_crawler_score = bad_crawler_score + score["bad_crawler"]["uneven_request_timing"] * weights["bad_crawler"]["uneven_request_timing"]
# bad_crawler_score = bad_crawler_score + score["bad_crawler"]["different_user_agents"] * weights["bad_crawler"]["different_user_agents"]
# bad_crawler_score = bad_crawler_score + score["bad_crawler"]["attack_url"] * weights["bad_crawler"]["attack_url"]
# regular_user_score = score["regular_user"]["risky_http_methods"] * weights["regular_user"]["risky_http_methods"]
# regular_user_score = regular_user_score + score["regular_user"]["robots_violations"] * weights["regular_user"]["robots_violations"]
# regular_user_score = regular_user_score + score["regular_user"]["uneven_request_timing"] * weights["regular_user"]["uneven_request_timing"]
# regular_user_score = regular_user_score + score["regular_user"]["different_user_agents"] * weights["regular_user"]["different_user_agents"]
# regular_user_score = regular_user_score + score["regular_user"]["attack_url"] * weights["regular_user"]["attack_url"]
# score_details = f"""
# Attacker score: {attacker_score}
# Good Crawler score: {good_crawler_score}
# Bad Crawler score: {bad_crawler_score}
# Regular User score: {regular_user_score}
# """
# app_logger.debug(score_details)
# analyzed_metrics = {"risky_http_methods": http_method_attacker_score, "robots_violations": violated_robots_ratio, "uneven_request_timing": mean, "different_user_agents": user_agents_used, "attack_url": attack_urls_found_list}
# category_scores = {"attacker": attacker_score, "good_crawler": good_crawler_score, "bad_crawler": bad_crawler_score, "regular_user": regular_user_score}
# category = max(category_scores, key=category_scores.get)
# last_analysis = datetime.now(tz=ZoneInfo('UTC'))
# self._db_manager.update_ip_stats_analysis(ip, analyzed_metrics, category, category_scores, last_analysis)
# return 0
# def update_ip_rep_infos(self, ip: str) -> list[str]:
# api_url = "https://iprep.lcrawl.com/api/iprep/"
# params = {
# "cidr": ip
# }
# headers = {
# "Content-Type": "application/json"
# }
# response = requests.get(api_url, headers=headers, params=params)
# payload = response.json()
# if payload["results"]:
# data = payload["results"][0]
# country_iso_code = data["geoip_data"]["country_iso_code"]
# asn = data["geoip_data"]["asn_autonomous_system_number"]
# asn_org = data["geoip_data"]["asn_autonomous_system_organization"]
# list_on = data["list_on"]
# sanitized_country_iso_code = sanitize_for_storage(country_iso_code, 3)
# sanitized_asn = sanitize_for_storage(asn, 100)
# sanitized_asn_org = sanitize_for_storage(asn_org, 100)
# sanitized_list_on = sanitize_dict(list_on, 100000)
# self._db_manager.update_ip_rep_infos(ip, sanitized_country_iso_code, sanitized_asn, sanitized_asn_org, sanitized_list_on)
# return

151
src/app.py Normal file
View File

@@ -0,0 +1,151 @@
#!/usr/bin/env python3
"""
FastAPI application factory for the Krawl honeypot.
Replaces the old http.server-based server.py.
"""
import sys
import os
from contextlib import asynccontextmanager
from fastapi import FastAPI, Request, Response
from fastapi.staticfiles import StaticFiles
from config import get_config
from tracker import AccessTracker, set_tracker
from database import initialize_database
from tasks_master import get_tasksmaster
from logger import initialize_logging, get_app_logger
from generators import random_server_header
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Application startup and shutdown lifecycle."""
config = get_config()
# Initialize logging
initialize_logging(log_level=config.log_level)
app_logger = get_app_logger()
# Initialize database and run pending migrations before accepting traffic
try:
app_logger.info(f"Initializing database at: {config.database_path}")
initialize_database(config.database_path)
app_logger.info("Database ready")
except Exception as e:
app_logger.warning(
f"Database initialization failed: {e}. Continuing with in-memory only."
)
# Initialize tracker
tracker = AccessTracker(config.max_pages_limit, config.ban_duration_seconds)
set_tracker(tracker)
# Store in app.state for dependency injection
app.state.config = config
app.state.tracker = tracker
# Load webpages file if provided via env var
webpages = None
webpages_file = os.environ.get("KRAWL_WEBPAGES_FILE")
if webpages_file:
try:
with open(webpages_file, "r") as f:
webpages = f.readlines()
if not webpages:
app_logger.warning(
"The webpages file was empty. Using randomly generated links."
)
webpages = None
except IOError:
app_logger.warning(
"Can't read webpages file. Using randomly generated links."
)
app.state.webpages = webpages
# Initialize canary counter
app.state.counter = config.canary_token_tries
# Start scheduled tasks
tasks_master = get_tasksmaster()
tasks_master.run_scheduled_tasks()
banner = f"""
============================================================
DASHBOARD AVAILABLE AT
{config.dashboard_secret_path}
============================================================
"""
app_logger.info(banner)
app_logger.info(f"Starting deception server on port {config.port}...")
if config.canary_token_url:
app_logger.info(
f"Canary token will appear after {config.canary_token_tries} tries"
)
else:
app_logger.info("No canary token configured (set CANARY_TOKEN_URL to enable)")
yield
# Shutdown
app_logger.info("Server shutting down...")
def create_app() -> FastAPI:
"""Create and configure the FastAPI application."""
application = FastAPI(
docs_url=None,
redoc_url=None,
openapi_url=None,
lifespan=lifespan,
)
# Random server header middleware (innermost — runs last on request, first on response)
@application.middleware("http")
async def server_header_middleware(request: Request, call_next):
response: Response = await call_next(request)
response.headers["Server"] = random_server_header()
return response
# Deception detection middleware (path traversal, XXE, command injection)
from middleware.deception import DeceptionMiddleware
application.add_middleware(DeceptionMiddleware)
# Banned IP check middleware (outermost — runs first on request)
from middleware.ban_check import BanCheckMiddleware
application.add_middleware(BanCheckMiddleware)
# Mount static files for the dashboard
config = get_config()
secret = config.dashboard_secret_path.lstrip("/")
static_dir = os.path.join(os.path.dirname(__file__), "templates", "static")
application.mount(
f"/{secret}/static",
StaticFiles(directory=static_dir),
name="dashboard-static",
)
# Import and include routers
from routes.honeypot import router as honeypot_router
from routes.api import router as api_router
from routes.dashboard import router as dashboard_router
from routes.htmx import router as htmx_router
# Dashboard/API/HTMX routes (prefixed with secret path, before honeypot catch-all)
dashboard_prefix = f"/{secret}"
application.include_router(dashboard_router, prefix=dashboard_prefix)
application.include_router(api_router, prefix=dashboard_prefix)
application.include_router(htmx_router, prefix=dashboard_prefix)
# Honeypot routes (catch-all must be last)
application.include_router(honeypot_router)
return application
app = create_app()

View File

@@ -37,6 +37,13 @@ class Config:
infinite_pages_for_malicious: bool = True # Infinite pages for malicious crawlers
ban_duration_seconds: int = 600 # Ban duration in seconds for IPs exceeding limits
# exporter settings
exports_path: str = "exports"
# backup job settings
backups_path: str = "backups"
backups_enabled: bool = False
backups_cron: str = "*/30 * * * *"
# Database settings
database_path: str = "data/krawl.db"
database_retention_days: int = 30
@@ -49,6 +56,8 @@ class Config:
user_agents_used_threshold: float = None
attack_urls_threshold: float = None
log_level: str = "INFO"
_server_ip: Optional[str] = None
_server_ip_cache_time: float = 0
_ip_cache_ttl: int = 300
@@ -85,7 +94,7 @@ class Config:
ip = response.text.strip()
if ip:
break
except Exception:
except requests.RequestException:
continue
if not ip:
@@ -150,10 +159,13 @@ class Config:
canary = data.get("canary", {})
dashboard = data.get("dashboard", {})
api = data.get("api", {})
exports = data.get("exports", {})
backups = data.get("backups", {})
database = data.get("database", {})
behavior = data.get("behavior", {})
analyzer = data.get("analyzer") or {}
crawl = data.get("crawl", {})
logging_cfg = data.get("logging", {})
# Handle dashboard_secret_path - auto-generate if null/not set
dashboard_path = dashboard.get("secret_path")
@@ -185,6 +197,10 @@ class Config:
canary_token_tries=canary.get("token_tries", 10),
dashboard_secret_path=dashboard_path,
probability_error_codes=behavior.get("probability_error_codes", 0),
exports_path=exports.get("path", "exports"),
backups_path=backups.get("path", "backups"),
backups_enabled=backups.get("enabled", False),
backups_cron=backups.get("cron"),
database_path=database.get("path", "data/krawl.db"),
database_retention_days=database.get("retention_days", 30),
http_risky_methods_threshold=analyzer.get(
@@ -204,6 +220,9 @@ class Config:
),
max_pages_limit=crawl.get("max_pages_limit", 250),
ban_duration_seconds=crawl.get("ban_duration_seconds", 600),
log_level=os.getenv(
"KRAWL_LOG_LEVEL", logging_cfg.get("level", "INFO")
).upper(),
)

File diff suppressed because it is too large Load Diff

655
src/deception_responses.py Normal file
View File

@@ -0,0 +1,655 @@
#!/usr/bin/env python3
import re
import secrets
import logging
import json
from typing import Optional, Tuple, Dict
from generators import random_username, random_password, random_email
from wordlists import get_wordlists
logger = logging.getLogger("krawl")
_sysrand = secrets.SystemRandom()
def detect_path_traversal(path: str, query: str = "", body: str = "") -> bool:
"""Detect path traversal attempts in request"""
full_input = f"{path} {query} {body}"
wl = get_wordlists()
pattern = wl.attack_patterns.get("path_traversal", "")
if not pattern:
# Fallback pattern if wordlists not loaded
pattern = r"(\.\.|%2e%2e|/etc/passwd|/etc/shadow)"
if re.search(pattern, full_input, re.IGNORECASE):
logger.debug(f"Path traversal detected in {full_input[:100]}")
return True
return False
def detect_xxe_injection(body: str) -> bool:
"""Detect XXE injection attempts in XML payloads"""
if not body:
return False
wl = get_wordlists()
pattern = wl.attack_patterns.get("xxe_injection", "")
if not pattern:
# Fallback pattern if wordlists not loaded
pattern = r"(<!ENTITY|<!DOCTYPE|SYSTEM|PUBLIC|file://)"
if re.search(pattern, body, re.IGNORECASE):
return True
return False
def detect_command_injection(path: str, query: str = "", body: str = "") -> bool:
"""Detect command injection attempts"""
full_input = f"{path} {query} {body}"
logger.debug(
f"[CMD_INJECTION_CHECK] path='{path}' query='{query}' body='{body[:50] if body else ''}'"
)
logger.debug(f"[CMD_INJECTION_CHECK] full_input='{full_input[:200]}'")
wl = get_wordlists()
pattern = wl.attack_patterns.get("command_injection", "")
if not pattern:
# Fallback pattern if wordlists not loaded
pattern = r"(cmd=|exec=|command=|&&|;|\||whoami|id|uname|cat|ls)"
if re.search(pattern, full_input, re.IGNORECASE):
logger.debug(f"[CMD_INJECTION_CHECK] Command injection pattern matched!")
return True
logger.debug(f"[CMD_INJECTION_CHECK] No command injection detected")
return False
def generate_fake_passwd() -> str:
"""Generate fake /etc/passwd content"""
wl = get_wordlists()
passwd_config = wl.fake_passwd
if not passwd_config:
# Fallback
return "root:x:0:0:root:/root:/bin/bash\nwww-data:x:33:33:www-data:/var/www:/usr/sbin/nologin"
users = passwd_config.get("system_users", [])
uid_min = passwd_config.get("uid_min", 1000)
uid_max = passwd_config.get("uid_max", 2000)
gid_min = passwd_config.get("gid_min", 1000)
gid_max = passwd_config.get("gid_max", 2000)
shells = passwd_config.get("shells", ["/bin/bash"])
fake_users = [
f"{random_username()}:x:{_sysrand.randint(uid_min, uid_max)}:{_sysrand.randint(gid_min, gid_max)}::/home/{random_username()}:{secrets.choice(shells)}"
for _ in range(3)
]
return "\n".join(users + fake_users)
def generate_fake_shadow() -> str:
"""Generate fake /etc/shadow content"""
wl = get_wordlists()
shadow_config = wl.fake_shadow
if not shadow_config:
# Fallback
return "root:$6$rounds=656000$fake_salt_here$fake_hash_data:19000:0:99999:7:::"
entries = shadow_config.get("system_entries", [])
hash_prefix = shadow_config.get("hash_prefix", "$6$rounds=656000$")
salt_length = shadow_config.get("salt_length", 16)
hash_length = shadow_config.get("hash_length", 86)
fake_entries = [
f"{random_username()}:{hash_prefix}{''.join(_sysrand.choices('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', k=salt_length))}${''.join(_sysrand.choices('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', k=hash_length))}:19000:0:99999:7:::"
for _ in range(3)
]
return "\n".join(entries + fake_entries)
def generate_fake_config_file(filename: str) -> str:
"""Generate fake configuration file content"""
configs = {
"config.php": """<?php
define('DB_HOST', 'localhost');
define('DB_NAME', 'app_database');
define('DB_USER', 'db_user');
define('DB_PASSWORD', 'fake_pass_123');
define('SECRET_KEY', 'fake_secret_key_xyz789');
define('API_ENDPOINT', 'https://api.example.com');
?>""",
"application.properties": """# Database Configuration
spring.datasource.url=jdbc:mysql://localhost:3306/appdb
spring.datasource.username=dbuser
spring.datasource.password=fake_password_123
server.port=8080
jwt.secret=fake_jwt_secret_key_456""",
".env": """DB_HOST=localhost
DB_PORT=3306
DB_NAME=production_db
DB_USER=app_user
DB_PASSWORD=fake_env_password_789
API_KEY=fake_api_key_abc123
SECRET_TOKEN=fake_secret_token_xyz""",
}
for key in configs:
if key.lower() in filename.lower():
return configs[key]
return f"""# Configuration File
api_endpoint = https://api.example.com
api_key = fake_key_{_sysrand.randint(1000, 9999)}
database_url = mysql://user:fake_pass@localhost/db
secret = fake_secret_{_sysrand.randint(10000, 99999)}
"""
def generate_fake_directory_listing(path: str) -> str:
"""Generate fake directory listing"""
wl = get_wordlists()
dir_config = wl.directory_listing
if not dir_config:
# Fallback
return f"<html><head><title>Index of {path}</title></head><body><h1>Index of {path}</h1></body></html>"
fake_dirs = dir_config.get("fake_directories", [])
fake_files = dir_config.get("fake_files", [])
directories = [(d["name"], d["size"], d["perms"]) for d in fake_dirs]
files = [
(f["name"], str(_sysrand.randint(f["size_min"], f["size_max"])), f["perms"])
for f in fake_files
]
html = f"<html><head><title>Index of {path}</title></head><body>"
html += f"<h1>Index of {path}</h1><hr><pre>"
html += f"{'Name':<40} {'Size':<10} {'Permissions':<15}\n"
html += "-" * 70 + "\n"
for name, size, perms in directories:
html += f"{name + '/':<40} {size:<10} {perms:<15}\n"
for name, size, perms in files:
html += f"{name:<40} {size:<10} {perms:<15}\n"
html += "</pre><hr></body></html>"
return html
def generate_path_traversal_response(path: str) -> Tuple[str, str, int]:
"""Generate fake response for path traversal attempts"""
path_lower = path.lower()
logger.debug(f"Generating path traversal response for: {path}")
if "passwd" in path_lower:
logger.debug("Returning fake passwd file")
return (generate_fake_passwd(), "text/plain", 200)
if "shadow" in path_lower:
logger.debug("Returning fake shadow file")
return (generate_fake_shadow(), "text/plain", 200)
if any(
ext in path_lower for ext in [".conf", ".config", ".php", ".env", ".properties"]
):
logger.debug("Returning fake config file")
return (generate_fake_config_file(path), "text/plain", 200)
if "proc/self" in path_lower:
logger.debug("Returning fake proc info")
return (f"{_sysrand.randint(1000, 9999)}", "text/plain", 200)
logger.debug("Returning fake directory listing")
return (generate_fake_directory_listing(path), "text/html", 200)
def generate_xxe_response(body: str) -> Tuple[str, str, int]:
"""Generate fake response for XXE injection attempts"""
wl = get_wordlists()
xxe_config = wl.xxe_responses
if "file://" in body:
if "passwd" in body:
content = generate_fake_passwd()
elif "shadow" in body:
content = generate_fake_shadow()
else:
content = (
xxe_config.get("default_content", "root:x:0:0:root:/root:/bin/bash")
if xxe_config
else "root:x:0:0:root:/root:/bin/bash"
)
if xxe_config and "file_access" in xxe_config:
template = xxe_config["file_access"]["template"]
response = template.replace("{content}", content)
else:
response = f"""<?xml version="1.0"?>
<response>
<status>success</status>
<data>{content}</data>
</response>"""
return (response, "application/xml", 200)
if "ENTITY" in body:
if xxe_config and "entity_processed" in xxe_config:
template = xxe_config["entity_processed"]["template"]
entity_values = xxe_config["entity_processed"]["entity_values"]
entity_value = secrets.choice(entity_values)
response = template.replace("{entity_value}", entity_value)
else:
response = """<?xml version="1.0"?>
<response>
<status>success</status>
<message>Entity processed successfully</message>
<entity_value>fake_entity_content_12345</entity_value>
</response>"""
return (response, "application/xml", 200)
if xxe_config and "error" in xxe_config:
template = xxe_config["error"]["template"]
messages = xxe_config["error"]["messages"]
message = secrets.choice(messages)
response = template.replace("{message}", message)
else:
response = """<?xml version="1.0"?>
<response>
<status>error</status>
<message>External entity processing disabled</message>
</response>"""
return (response, "application/xml", 200)
def generate_command_injection_response(input_text: str) -> Tuple[str, str, int]:
"""Generate fake command execution output"""
wl = get_wordlists()
cmd_config = wl.command_outputs
input_lower = input_text.lower()
# id command
if re.search(r"\bid\b", input_lower):
if cmd_config and "id" in cmd_config:
uid = _sysrand.randint(
cmd_config.get("uid_min", 1000), cmd_config.get("uid_max", 2000)
)
gid = _sysrand.randint(
cmd_config.get("gid_min", 1000), cmd_config.get("gid_max", 2000)
)
template = secrets.choice(cmd_config["id"])
output = template.replace("{uid}", str(uid)).replace("{gid}", str(gid))
else:
output = f"uid={_sysrand.randint(1000, 2000)}(www-data) gid={_sysrand.randint(1000, 2000)}(www-data) groups={_sysrand.randint(1000, 2000)}(www-data)"
return (output, "text/plain", 200)
# whoami command
if re.search(r"\bwhoami\b", input_lower):
users = cmd_config.get("whoami", ["www-data"]) if cmd_config else ["www-data"]
return (secrets.choice(users), "text/plain", 200)
# uname command
if re.search(r"\buname\b", input_lower):
outputs = (
cmd_config.get("uname", ["Linux server 5.4.0 x86_64"])
if cmd_config
else ["Linux server 5.4.0 x86_64"]
)
return (secrets.choice(outputs), "text/plain", 200)
# pwd command
if re.search(r"\bpwd\b", input_lower):
paths = (
cmd_config.get("pwd", ["/var/www/html"])
if cmd_config
else ["/var/www/html"]
)
return (secrets.choice(paths), "text/plain", 200)
# ls command
if re.search(r"\bls\b", input_lower):
if cmd_config and "ls" in cmd_config:
files = secrets.choice(cmd_config["ls"])
else:
files = ["index.php", "config.php", "uploads"]
output = "\n".join(
_sysrand.sample(files, k=_sysrand.randint(3, min(6, len(files))))
)
return (output, "text/plain", 200)
# cat command
if re.search(r"\bcat\b", input_lower):
if "passwd" in input_lower:
return (generate_fake_passwd(), "text/plain", 200)
if "shadow" in input_lower:
return (generate_fake_shadow(), "text/plain", 200)
cat_content = (
cmd_config.get("cat_config", "<?php\n$config = 'fake';\n?>")
if cmd_config
else "<?php\n$config = 'fake';\n?>"
)
return (cat_content, "text/plain", 200)
# echo command
if re.search(r"\becho\b", input_lower):
match = re.search(r"echo\s+(.+?)(?:[;&|]|$)", input_text, re.IGNORECASE)
if match:
return (match.group(1).strip("\"'"), "text/plain", 200)
return ("", "text/plain", 200)
# network commands
if any(cmd in input_lower for cmd in ["wget", "curl", "nc", "netcat"]):
if cmd_config and "network_commands" in cmd_config:
outputs = cmd_config["network_commands"]
output = secrets.choice(outputs)
if "{size}" in output:
size = _sysrand.randint(
cmd_config.get("download_size_min", 100),
cmd_config.get("download_size_max", 10000),
)
output = output.replace("{size}", str(size))
else:
outputs = ["bash: command not found", "Connection timeout"]
output = secrets.choice(outputs)
return (output, "text/plain", 200)
# generic outputs
if cmd_config and "generic" in cmd_config:
generic_outputs = cmd_config["generic"]
output = secrets.choice(generic_outputs)
if "{num}" in output:
output = output.replace("{num}", str(_sysrand.randint(1, 99)))
else:
generic_outputs = ["", "Command executed successfully", "sh: syntax error"]
output = secrets.choice(generic_outputs)
return (output, "text/plain", 200)
def detect_sql_injection_pattern(query_string: str) -> Optional[str]:
"""Detect SQL injection patterns in query string"""
if not query_string:
return None
query_lower = query_string.lower()
patterns = {
"quote": [r"'", r'"', r"`"],
"comment": [r"--", r"#", r"/\*", r"\*/"],
"union": [r"\bunion\b", r"\bunion\s+select\b"],
"boolean": [r"\bor\b.*=.*", r"\band\b.*=.*", r"'.*or.*'.*=.*'"],
"time_based": [r"\bsleep\b", r"\bwaitfor\b", r"\bdelay\b", r"\bbenchmark\b"],
"stacked": [r";.*select", r";.*drop", r";.*insert", r";.*update", r";.*delete"],
"command": [r"\bexec\b", r"\bexecute\b", r"\bxp_cmdshell\b"],
"info_schema": [r"information_schema", r"table_schema", r"table_name"],
}
for injection_type, pattern_list in patterns.items():
for pattern in pattern_list:
if re.search(pattern, query_lower):
logger.debug(f"SQL injection pattern '{injection_type}' detected")
return injection_type
return None
def get_random_sql_error(
db_type: str = None, injection_type: str = None
) -> Tuple[str, str]:
"""Generate a random SQL error message"""
wl = get_wordlists()
sql_errors = wl.sql_errors
if not sql_errors:
return ("Database error occurred", "text/plain")
if not db_type:
db_type = secrets.choice(list(sql_errors.keys()))
db_errors = sql_errors.get(db_type, {})
if injection_type and injection_type in db_errors:
errors = db_errors[injection_type]
elif "generic" in db_errors:
errors = db_errors["generic"]
else:
all_errors = []
for error_list in db_errors.values():
if isinstance(error_list, list):
all_errors.extend(error_list)
errors = all_errors if all_errors else ["Database error occurred"]
error_message = secrets.choice(errors) if errors else "Database error occurred"
if "{table}" in error_message:
tables = ["users", "products", "orders", "customers", "accounts", "sessions"]
error_message = error_message.replace("{table}", secrets.choice(tables))
if "{column}" in error_message:
columns = ["id", "name", "email", "password", "username", "created_at"]
error_message = error_message.replace("{column}", secrets.choice(columns))
return (error_message, "text/plain")
def generate_sql_error_response(
query_string: str, db_type: str = None
) -> Tuple[Optional[str], Optional[str], Optional[int]]:
"""Generate SQL error response for detected injection attempts"""
injection_type = detect_sql_injection_pattern(query_string)
if not injection_type:
return (None, None, None)
error_message, content_type = get_random_sql_error(db_type, injection_type)
status_code = 500
if _sysrand.random() < 0.3:
status_code = 200
logger.info(f"SQL injection detected: {injection_type}")
return (error_message, content_type, status_code)
def get_sql_response_with_data(path: str, params: str) -> str:
"""Generate fake SQL query response with data"""
injection_type = detect_sql_injection_pattern(params)
if injection_type in ["union", "boolean", "stacked"]:
data = {
"success": True,
"results": [
{
"id": i,
"username": random_username(),
"email": random_email(),
"password_hash": random_password(),
"role": secrets.choice(["admin", "user", "moderator"]),
}
for i in range(1, _sysrand.randint(2, 5))
],
}
return json.dumps(data, indent=2)
return json.dumps(
{"success": True, "message": "Query executed successfully", "results": []},
indent=2,
)
def detect_xss_pattern(input_string: str) -> bool:
"""Detect XSS patterns in input"""
if not input_string:
return False
wl = get_wordlists()
xss_pattern = wl.attack_patterns.get("xss_attempt", "")
if not xss_pattern:
xss_pattern = r"(<script|</script|javascript:|onerror=|onload=|onclick=|<iframe|<img|<svg|eval\(|alert\()"
detected = bool(re.search(xss_pattern, input_string, re.IGNORECASE))
if detected:
logger.debug(f"XSS pattern detected in input")
return detected
def generate_xss_response(input_data: dict) -> str:
"""Generate response for XSS attempts with reflected content"""
xss_detected = False
reflected_content = []
for key, value in input_data.items():
if detect_xss_pattern(value):
xss_detected = True
reflected_content.append(f"<p><strong>{key}:</strong> {value}</p>")
if xss_detected:
logger.info("XSS attempt detected and reflected")
html = f"""
<!DOCTYPE html>
<html>
<head>
<title>Submission Received</title>
<style>
body {{ font-family: Arial, sans-serif; max-width: 600px; margin: 50px auto; padding: 20px; }}
.success {{ background: #d4edda; padding: 20px; border-radius: 8px; border: 1px solid #c3e6cb; }}
h2 {{ color: #155724; }}
p {{ margin: 10px 0; }}
</style>
</head>
<body>
<div class="success">
<h2>Thank you for your submission!</h2>
<p>We have received your information:</p>
{''.join(reflected_content)}
<p><em>We will get back to you shortly.</em></p>
</div>
</body>
</html>
"""
return html
return """
<!DOCTYPE html>
<html>
<head>
<title>Submission Received</title>
<style>
body { font-family: Arial, sans-serif; max-width: 600px; margin: 50px auto; padding: 20px; }
.success { background: #d4edda; padding: 20px; border-radius: 8px; border: 1px solid #c3e6cb; }
h2 { color: #155724; }
</style>
</head>
<body>
<div class="success">
<h2>Thank you for your submission!</h2>
<p>Your message has been received and we will respond soon.</p>
</div>
</body>
</html>
"""
def generate_server_error() -> Tuple[str, str]:
"""Generate fake server error page"""
wl = get_wordlists()
server_errors = wl.server_errors
if not server_errors:
return ("500 Internal Server Error", "text/html")
server_type = secrets.choice(list(server_errors.keys()))
server_config = server_errors[server_type]
error_codes = {
400: "Bad Request",
401: "Unauthorized",
403: "Forbidden",
404: "Not Found",
500: "Internal Server Error",
502: "Bad Gateway",
503: "Service Unavailable",
}
code = secrets.choice(list(error_codes.keys()))
message = error_codes[code]
template = server_config.get("template", "")
version = secrets.choice(server_config.get("versions", ["1.0"]))
html = template.replace("{code}", str(code))
html = html.replace("{message}", message)
html = html.replace("{version}", version)
if server_type == "apache":
os = secrets.choice(server_config.get("os", ["Ubuntu"]))
html = html.replace("{os}", os)
html = html.replace("{host}", "localhost")
logger.debug(f"Generated {server_type} server error: {code}")
return (html, "text/html")
def get_server_header(server_type: str = None) -> str:
"""Get a fake server header string"""
wl = get_wordlists()
server_errors = wl.server_errors
if not server_errors:
return "nginx/1.18.0"
if not server_type:
server_type = secrets.choice(list(server_errors.keys()))
server_config = server_errors.get(server_type, {})
version = secrets.choice(server_config.get("versions", ["1.0"]))
server_headers = {
"nginx": f"nginx/{version}",
"apache": f"Apache/{version}",
"iis": f"Microsoft-IIS/{version}",
"tomcat": f"Apache-Coyote/1.1",
}
return server_headers.get(server_type, "nginx/1.18.0")
def detect_and_respond_deception(
path: str, query: str = "", body: str = "", method: str = "GET"
) -> Optional[Tuple[str, str, int]]:
"""
Main deception detection and response function.
Returns (response_body, content_type, status_code) if deception should be applied, None otherwise.
"""
logger.debug(
f"Checking deception for {method} {path} query={query[:50] if query else 'empty'}"
)
if detect_path_traversal(path, query, body):
logger.info(f"Path traversal detected in: {path}")
return generate_path_traversal_response(f"{path}?{query}" if query else path)
if body and detect_xxe_injection(body):
logger.info(f"XXE injection detected")
return generate_xxe_response(body)
if detect_command_injection(path, query, body):
logger.info(f"Command injection detected in: {path}")
full_input = f"{path} {query} {body}"
return generate_command_injection_response(full_input)
return None

95
src/dependencies.py Normal file
View File

@@ -0,0 +1,95 @@
#!/usr/bin/env python3
"""
FastAPI dependency injection providers.
Replaces Handler class variables with proper DI.
"""
import os
from datetime import datetime
from fastapi import Request
from fastapi.templating import Jinja2Templates
from config import Config
from tracker import AccessTracker
from database import DatabaseManager, get_database
from logger import get_app_logger, get_access_logger, get_credential_logger
# Shared Jinja2 templates instance
_templates = None
def get_templates() -> Jinja2Templates:
"""Get shared Jinja2Templates instance with custom filters."""
global _templates
if _templates is None:
templates_dir = os.path.join(os.path.dirname(__file__), "templates", "jinja2")
_templates = Jinja2Templates(directory=templates_dir)
_templates.env.filters["format_ts"] = _format_ts
return _templates
def _format_ts(value, time_only=False):
"""Custom Jinja2 filter for formatting ISO timestamps."""
if not value:
return "N/A"
if isinstance(value, str):
try:
value = datetime.fromisoformat(value)
except (ValueError, TypeError):
return value
if time_only:
return value.strftime("%H:%M:%S")
if value.date() == datetime.now().date():
return value.strftime("%H:%M:%S")
return value.strftime("%m/%d/%Y %H:%M:%S")
def get_tracker(request: Request) -> AccessTracker:
return request.app.state.tracker
def get_app_config(request: Request) -> Config:
return request.app.state.config
def get_db() -> DatabaseManager:
return get_database()
def get_client_ip(request: Request) -> str:
"""Extract client IP address from request, checking proxy headers first."""
forwarded_for = request.headers.get("X-Forwarded-For")
if forwarded_for:
return forwarded_for.split(",")[0].strip()
real_ip = request.headers.get("X-Real-IP")
if real_ip:
return real_ip.strip()
if request.client:
return request.client.host
return "0.0.0.0"
def build_raw_request(request: Request, body: str = "") -> str:
"""Build raw HTTP request string for forensic analysis."""
try:
raw = f"{request.method} {request.url.path}"
if request.url.query:
raw += f"?{request.url.query}"
raw += f" HTTP/1.1\r\n"
for header, value in request.headers.items():
raw += f"{header}: {value}\r\n"
raw += "\r\n"
if body:
raw += body
return raw
except Exception as e:
return f"{request.method} {request.url.path} (error building full request: {str(e)})"

42
src/firewall/fwtype.py Normal file
View File

@@ -0,0 +1,42 @@
from abc import ABC, abstractmethod
from typing import Dict, Type
class FWType(ABC):
"""Abstract base class for firewall types."""
# Registry to store child classes
_registry: Dict[str, Type["FWType"]] = {}
def __init_subclass__(cls, **kwargs):
"""Automatically register subclasses with their class name."""
super().__init_subclass__(**kwargs)
cls._registry[cls.__name__.lower()] = cls
@classmethod
def create(cls, fw_type: str, **kwargs) -> "FWType":
"""
Factory method to create instances of child classes.
Args:
fw_type: String name of the firewall type class to instantiate
**kwargs: Arguments to pass to the child class constructor
Returns:
Instance of the requested child class
Raises:
ValueError: If fw_type is not registered
"""
fw_type = fw_type.lower()
if fw_type not in cls._registry:
available = ", ".join(cls._registry.keys())
raise ValueError(
f"Unknown firewall type: '{fw_type}'. Available: {available}"
)
return cls._registry[fw_type](**kwargs)
@abstractmethod
def getBanlist(self, ips):
"""Return the ruleset for the specific server"""

40
src/firewall/iptables.py Normal file
View File

@@ -0,0 +1,40 @@
from typing_extensions import override
from firewall.fwtype import FWType
class Iptables(FWType):
@override
def getBanlist(self, ips) -> str:
"""
Generate iptables ban rules from an array of IP addresses.
Args:
ips: List of IP addresses to ban
Returns:
String containing iptables commands, one per line
"""
if not ips:
return ""
rules = []
chain = "INPUT"
target = "DROP"
rules.append("#!/bin/bash")
rules.append("# iptables ban rules")
rules.append("")
for ip in ips:
ip = ip.strip()
# Build the iptables command
rule_parts = ["iptables", "-A", chain, "-s", ip]
# Add target
rule_parts.extend(["-j", target])
rules.append(" ".join(rule_parts))
return "\n".join(rules)

21
src/firewall/raw.py Normal file
View File

@@ -0,0 +1,21 @@
from typing_extensions import override
from firewall.fwtype import FWType
class Raw(FWType):
@override
def getBanlist(self, ips) -> str:
"""
Generate raw list of bad IP addresses.
Args:
ips: List of IP addresses to ban
Returns:
String containing raw ips, one per line
"""
if not ips:
return ""
return "\n".join(ips)

View File

@@ -1,113 +1,117 @@
#!/usr/bin/env python3
"""
Geolocation utilities for reverse geocoding and city lookups.
Geolocation utilities for IP lookups using ip-api.com.
"""
import requests
from typing import Optional, Tuple
from typing import Optional, Dict, Any
from logger import get_app_logger
app_logger = get_app_logger()
# Simple city name cache to avoid repeated API calls
_city_cache = {}
def reverse_geocode_city(latitude: float, longitude: float) -> Optional[str]:
def fetch_ip_geolocation(ip_address: str) -> Optional[Dict[str, Any]]:
"""
Reverse geocode coordinates to get city name using Nominatim (OpenStreetMap).
Fetch geolocation data for an IP address using ip-api.com.
Results are persisted to the database by the caller (fetch_ip_rep task),
so no in-memory caching is needed.
Args:
latitude: Latitude coordinate
longitude: Longitude coordinate
ip_address: IP address to lookup
Returns:
City name or None if not found
Dictionary containing geolocation data or None if lookup fails
"""
# Check cache first
cache_key = f"{latitude},{longitude}"
if cache_key in _city_cache:
return _city_cache[cache_key]
try:
# Use Nominatim reverse geocoding API (free, no API key required)
url = "https://nominatim.openstreetmap.org/reverse"
url = f"http://ip-api.com/json/{ip_address}"
params = {
"lat": latitude,
"lon": longitude,
"format": "json",
"zoom": 10, # City level
"addressdetails": 1,
"fields": "status,message,country,countryCode,region,regionName,city,zip,lat,lon,timezone,isp,org,as,reverse,mobile,proxy,hosting,query"
}
headers = {"User-Agent": "Krawl-Honeypot/1.0"} # Required by Nominatim ToS
response = requests.get(url, params=params, headers=headers, timeout=5)
response = requests.get(url, params=params, timeout=5)
response.raise_for_status()
data = response.json()
address = data.get("address", {})
# Try to get city from various possible fields
city = (
address.get("city")
or address.get("town")
or address.get("village")
or address.get("municipality")
or address.get("county")
)
if data.get("status") != "success":
app_logger.warning(
f"IP lookup failed for {ip_address}: {data.get('message')}"
)
return None
# Cache the result
_city_cache[cache_key] = city
if city:
app_logger.debug(f"Reverse geocoded {latitude},{longitude} to {city}")
return city
app_logger.debug(f"Fetched geolocation for {ip_address}")
return data
except requests.RequestException as e:
app_logger.warning(f"Reverse geocoding failed for {latitude},{longitude}: {e}")
app_logger.warning(f"Geolocation API call failed for {ip_address}: {e}")
return None
except Exception as e:
app_logger.error(f"Error in reverse geocoding: {e}")
app_logger.error(f"Error fetching geolocation for {ip_address}: {e}")
return None
def get_most_recent_geoip_data(results: list) -> Optional[dict]:
def extract_geolocation_from_ip(ip_address: str) -> Optional[Dict[str, Any]]:
"""
Extract the most recent geoip_data from API results.
Results are assumed to be sorted by record_added (most recent first).
Extract geolocation data for an IP address.
Args:
results: List of result dictionaries from IP reputation API
ip_address: IP address to lookup
Returns:
Most recent geoip_data dict or None
Dictionary with city, country, lat, lon, and other geolocation data or None
"""
if not results:
geoloc_data = fetch_ip_geolocation(ip_address)
if not geoloc_data:
return None
# The first result is the most recent (sorted by record_added)
most_recent = results[0]
return most_recent.get("geoip_data")
return {
"city": geoloc_data.get("city"),
"country": geoloc_data.get("country"),
"country_code": geoloc_data.get("countryCode"),
"region": geoloc_data.get("region"),
"region_name": geoloc_data.get("regionName"),
"latitude": geoloc_data.get("lat"),
"longitude": geoloc_data.get("lon"),
"timezone": geoloc_data.get("timezone"),
"isp": geoloc_data.get("isp"),
"org": geoloc_data.get("org"),
"reverse": geoloc_data.get("reverse"),
"is_proxy": geoloc_data.get("proxy"),
"is_hosting": geoloc_data.get("hosting"),
}
def extract_city_from_coordinates(geoip_data: dict) -> Optional[str]:
def fetch_blocklist_data(ip_address: str) -> Optional[Dict[str, Any]]:
"""
Extract city name from geoip_data using reverse geocoding.
Fetch blocklist data for an IP address using lcrawl API.
Args:
geoip_data: Dictionary containing location_latitude and location_longitude
ip_address: IP address to lookup
Returns:
City name or None
Dictionary containing blocklist information or None if lookup fails
"""
if not geoip_data:
return None
# This is now used only for ip reputation
try:
api_url = "https://iprep.lcrawl.com/api/iprep/"
params = {"cidr": ip_address}
headers = {"Content-Type": "application/json"}
response = requests.get(api_url, headers=headers, params=params, timeout=10)
latitude = geoip_data.get("location_latitude")
longitude = geoip_data.get("location_longitude")
if response.status_code == 200:
payload = response.json()
if payload.get("results"):
results = payload["results"]
# Get the most recent result (first in list, sorted by record_added)
most_recent = results[0]
list_on = most_recent.get("list_on", {})
if latitude is None or longitude is None:
return None
app_logger.debug(f"Fetched blocklist data for {ip_address}")
return list_on
except requests.RequestException as e:
app_logger.warning(f"Failed to fetch blocklist data for {ip_address}: {e}")
except Exception as e:
app_logger.error(f"Error processing blocklist data for {ip_address}: {e}")
return reverse_geocode_city(latitude, longitude)
return None

File diff suppressed because it is too large Load Diff

View File

@@ -36,12 +36,13 @@ class LoggerManager:
cls._instance._initialized = False
return cls._instance
def initialize(self, log_dir: str = "logs") -> None:
def initialize(self, log_dir: str = "logs", log_level: str = "INFO") -> None:
"""
Initialize the logging system with rotating file handlers.loggers
Args:
log_dir: Directory for log files (created if not exists)
log_level: Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
"""
if self._initialized:
return
@@ -59,9 +60,11 @@ class LoggerManager:
max_bytes = 1048576 # 1MB
backup_count = 5
level = getattr(logging, log_level.upper(), logging.INFO)
# Setup application logger
self._app_logger = logging.getLogger("krawl.app")
self._app_logger.setLevel(logging.INFO)
self._app_logger.setLevel(level)
self._app_logger.handlers.clear()
app_file_handler = RotatingFileHandler(
@@ -78,7 +81,7 @@ class LoggerManager:
# Setup access logger
self._access_logger = logging.getLogger("krawl.access")
self._access_logger.setLevel(logging.INFO)
self._access_logger.setLevel(level)
self._access_logger.handlers.clear()
access_file_handler = RotatingFileHandler(
@@ -95,7 +98,7 @@ class LoggerManager:
# Setup credential logger (special format, no stream handler)
self._credential_logger = logging.getLogger("krawl.credentials")
self._credential_logger.setLevel(logging.INFO)
self._credential_logger.setLevel(level)
self._credential_logger.handlers.clear()
# Credential logger uses a simple format: timestamp|ip|username|password|path
@@ -152,6 +155,6 @@ def get_credential_logger() -> logging.Logger:
return _logger_manager.credentials
def initialize_logging(log_dir: str = "logs") -> None:
def initialize_logging(log_dir: str = "logs", log_level: str = "INFO") -> None:
"""Initialize the logging system."""
_logger_manager.initialize(log_dir)
_logger_manager.initialize(log_dir, log_level)

View File

@@ -0,0 +1,5 @@
#!/usr/bin/env python3
"""
FastAPI middleware package for the Krawl honeypot.
"""

View File

@@ -0,0 +1,29 @@
#!/usr/bin/env python3
"""
Middleware for checking if client IP is banned.
"""
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import Response
from dependencies import get_client_ip
class BanCheckMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
# Skip ban check for dashboard routes
config = request.app.state.config
dashboard_prefix = "/" + config.dashboard_secret_path.lstrip("/")
if request.url.path.startswith(dashboard_prefix):
return await call_next(request)
client_ip = get_client_ip(request)
tracker = request.app.state.tracker
if tracker.is_banned_ip(client_ip):
return Response(status_code=500)
response = await call_next(request)
return response

102
src/middleware/deception.py Normal file
View File

@@ -0,0 +1,102 @@
#!/usr/bin/env python3
"""
Middleware for deception response detection (path traversal, XXE, command injection).
Short-circuits the request if a deception response is triggered.
"""
import asyncio
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import Response
from deception_responses import detect_and_respond_deception
from dependencies import get_client_ip, build_raw_request
from logger import get_app_logger, get_access_logger
class DeceptionMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
path = request.url.path
# Skip deception detection for dashboard routes
config = request.app.state.config
dashboard_prefix = "/" + config.dashboard_secret_path.lstrip("/")
if path.startswith(dashboard_prefix):
return await call_next(request)
query = request.url.query or ""
method = request.method
# Read body for POST requests
body = ""
if method == "POST":
body_bytes = await request.body()
body = body_bytes.decode("utf-8", errors="replace")
result = detect_and_respond_deception(path, query, body, method)
if result:
response_body, content_type, status_code = result
client_ip = get_client_ip(request)
user_agent = request.headers.get("User-Agent", "")
app_logger = get_app_logger()
access_logger = get_access_logger()
# Determine attack type for logging
full_input = f"{path} {query} {body}".lower()
attack_type_log = "UNKNOWN"
if (
"passwd" in path.lower()
or "shadow" in path.lower()
or ".." in path
or ".." in query
):
attack_type_log = "PATH_TRAVERSAL"
elif body and ("<!DOCTYPE" in body or "<!ENTITY" in body):
attack_type_log = "XXE_INJECTION"
elif any(
pattern in full_input
for pattern in [
"cmd=",
"exec=",
"command=",
"execute=",
"system=",
";",
"|",
"&&",
"whoami",
"id",
"uname",
"cat",
"ls",
"pwd",
]
):
attack_type_log = "COMMAND_INJECTION"
access_logger.warning(
f"[{attack_type_log} DETECTED] {client_ip} - {path[:100]} - Method: {method}"
)
# Record access
tracker = request.app.state.tracker
tracker.record_access(
ip=client_ip,
path=path,
user_agent=user_agent,
body=body,
method=method,
raw_request=build_raw_request(request, body),
)
return Response(
content=response_body,
status_code=status_code,
media_type=content_type,
)
response = await call_next(request)
return response

60
src/migrations/README.md Normal file
View File

@@ -0,0 +1,60 @@
# Database Migrations
This directory contains database migration scripts for Krawl.
From the 1.0.0 stable version we added some features that require schema changes and performance optimizations. These migration scripts ensure that existing users can seamlessly upgrade without data loss or downtime.
## Available Migrations
### add_raw_request_column.py
Adds the `raw_request` column to the `access_logs` table to store complete HTTP requests for forensic analysis.
**Usage:**
```bash
# Run with default database path (src/data/krawl.db)
python3 migrations/add_raw_request_column.py
# Run with custom database path
python3 migrations/add_raw_request_column.py /path/to/krawl.db
```
### add_performance_indexes.py
Adds critical performance indexes to the `attack_detections` table for efficient aggregation and filtering with large datasets (100k+ records).
**Indexes Added:**
- `ix_attack_detections_attack_type` - Speeds up GROUP BY on attack_type
- `ix_attack_detections_type_log` - Composite index for attack_type + access_log_id
**Usage:**
```bash
# Run with default database path
python3 migrations/add_performance_indexes.py
# Run with custom database path
python3 migrations/add_performance_indexes.py /path/to/krawl.db
```
**Post-Migration Optimization:**
```bash
# Compact database and update query planner statistics
sqlite3 /path/to/krawl.db "VACUUM; ANALYZE;"
```
## Running Migrations
All migration scripts are designed to be idempotent and safe to run multiple times. They will:
1. Check if the migration is already applied
2. Skip if already applied
3. Apply the migration if needed
4. Report the result
## Creating New Migrations
When creating a new migration:
1. Name the file descriptively: `action_description.py`
2. Make it idempotent (safe to run multiple times)
3. Add checks before making changes
4. Provide clear error messages
5. Support custom database paths via command line
6. Update this README with usage instructions

View File

View File

@@ -0,0 +1,122 @@
#!/usr/bin/env python3
"""
Migration script to add performance indexes to attack_detections table.
This dramatically improves query performance with large datasets (100k+ records).
"""
import sqlite3
import sys
import os
def index_exists(cursor, index_name: str) -> bool:
"""Check if an index exists."""
cursor.execute(
"SELECT name FROM sqlite_master WHERE type='index' AND name=?", (index_name,)
)
return cursor.fetchone() is not None
def add_performance_indexes(db_path: str) -> bool:
"""
Add performance indexes to optimize queries.
Args:
db_path: Path to the SQLite database file
Returns:
True if indexes were added or already exist, False on error
"""
try:
# Check if database exists
if not os.path.exists(db_path):
print(f"Database file not found: {db_path}")
return False
# Connect to database
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
indexes_added = []
indexes_existed = []
# Index 1: attack_type for efficient GROUP BY operations
if not index_exists(cursor, "ix_attack_detections_attack_type"):
print("Adding index on attack_detections.attack_type...")
cursor.execute("""
CREATE INDEX ix_attack_detections_attack_type
ON attack_detections(attack_type)
""")
indexes_added.append("ix_attack_detections_attack_type")
else:
indexes_existed.append("ix_attack_detections_attack_type")
# Index 2: Composite index for attack_type + access_log_id
if not index_exists(cursor, "ix_attack_detections_type_log"):
print(
"Adding composite index on attack_detections(attack_type, access_log_id)..."
)
cursor.execute("""
CREATE INDEX ix_attack_detections_type_log
ON attack_detections(attack_type, access_log_id)
""")
indexes_added.append("ix_attack_detections_type_log")
else:
indexes_existed.append("ix_attack_detections_type_log")
conn.commit()
conn.close()
# Report results
if indexes_added:
print(f"Successfully added {len(indexes_added)} index(es):")
for idx in indexes_added:
print(f" - {idx}")
if indexes_existed:
print(f" {len(indexes_existed)} index(es) already existed:")
for idx in indexes_existed:
print(f" - {idx}")
if not indexes_added and not indexes_existed:
print("No indexes processed")
return True
except sqlite3.Error as e:
print(f"SQLite error: {e}")
return False
except Exception as e:
print(f"Unexpected error: {e}")
return False
def main():
"""Main migration function."""
# Default database path
default_db_path = os.path.join(
os.path.dirname(os.path.dirname(__file__)), "data", "krawl.db"
)
# Allow custom path as command line argument
db_path = sys.argv[1] if len(sys.argv) > 1 else default_db_path
print(f"Adding performance indexes to database: {db_path}")
print("=" * 60)
success = add_performance_indexes(db_path)
print("=" * 60)
if success:
print("Migration completed successfully")
print("\n💡 Performance tip: Run 'VACUUM' and 'ANALYZE' on your database")
print(" to optimize query planner statistics after adding indexes.")
sys.exit(0)
else:
print("Migration failed")
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,93 @@
#!/usr/bin/env python3
"""
Migration script to add raw_request column to access_logs table.
This script is safe to run multiple times - it checks if the column exists before adding it.
"""
import sqlite3
import sys
import os
from pathlib import Path
def column_exists(cursor, table_name: str, column_name: str) -> bool:
"""Check if a column exists in a table."""
cursor.execute(f"PRAGMA table_info({table_name})")
columns = [row[1] for row in cursor.fetchall()]
return column_name in columns
def add_raw_request_column(db_path: str) -> bool:
"""
Add raw_request column to access_logs table if it doesn't exist.
Args:
db_path: Path to the SQLite database file
Returns:
True if column was added or already exists, False on error
"""
try:
# Check if database exists
if not os.path.exists(db_path):
print(f"Database file not found: {db_path}")
return False
# Connect to database
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Check if column already exists
if column_exists(cursor, "access_logs", "raw_request"):
print("Column 'raw_request' already exists in access_logs table")
conn.close()
return True
# Add the column
print("Adding 'raw_request' column to access_logs table...")
cursor.execute("""
ALTER TABLE access_logs
ADD COLUMN raw_request TEXT
""")
conn.commit()
conn.close()
print("✅ Successfully added 'raw_request' column to access_logs table")
return True
except sqlite3.Error as e:
print(f"SQLite error: {e}")
return False
except Exception as e:
print(f"Unexpected error: {e}")
return False
def main():
"""Main migration function."""
# Default database path
default_db_path = os.path.join(
os.path.dirname(os.path.dirname(__file__)), "data", "krawl.db"
)
# Allow custom path as command line argument
db_path = sys.argv[1] if len(sys.argv) > 1 else default_db_path
print(f"🔄 Running migration on database: {db_path}")
print("=" * 60)
success = add_raw_request_column(db_path)
print("=" * 60)
if success:
print("Migration completed successfully")
sys.exit(0)
else:
print("Migration failed")
sys.exit(1)
if __name__ == "__main__":
main()

127
src/migrations/runner.py Normal file
View File

@@ -0,0 +1,127 @@
"""
Migration runner for Krawl.
Checks the database schema and applies any pending migrations at startup.
All checks are idempotent — safe to run on every boot.
Note: table creation (e.g. category_history) is already handled by
Base.metadata.create_all() in DatabaseManager.initialize() and is NOT
duplicated here. This runner only covers ALTER-level changes that
create_all() cannot apply to existing tables (new columns, new indexes).
"""
import sqlite3
import logging
from typing import List
logger = logging.getLogger("krawl")
def _column_exists(cursor, table_name: str, column_name: str) -> bool:
cursor.execute(f"PRAGMA table_info({table_name})")
columns = [row[1] for row in cursor.fetchall()]
return column_name in columns
def _index_exists(cursor, index_name: str) -> bool:
cursor.execute(
"SELECT name FROM sqlite_master WHERE type='index' AND name=?",
(index_name,),
)
return cursor.fetchone() is not None
def _migrate_raw_request_column(cursor) -> bool:
"""Add raw_request column to access_logs if missing."""
if _column_exists(cursor, "access_logs", "raw_request"):
return False
cursor.execute("ALTER TABLE access_logs ADD COLUMN raw_request TEXT")
return True
def _migrate_need_reevaluation_column(cursor) -> bool:
"""Add need_reevaluation column to ip_stats if missing."""
if _column_exists(cursor, "ip_stats", "need_reevaluation"):
return False
cursor.execute(
"ALTER TABLE ip_stats ADD COLUMN need_reevaluation BOOLEAN DEFAULT 0"
)
return True
def _migrate_ban_state_columns(cursor) -> List[str]:
"""Add ban/rate-limit columns to ip_stats if missing."""
added = []
columns = {
"page_visit_count": "INTEGER DEFAULT 0",
"ban_timestamp": "DATETIME",
"total_violations": "INTEGER DEFAULT 0",
"ban_multiplier": "INTEGER DEFAULT 1",
}
for col_name, col_type in columns.items():
if not _column_exists(cursor, "ip_stats", col_name):
cursor.execute(f"ALTER TABLE ip_stats ADD COLUMN {col_name} {col_type}")
added.append(col_name)
return added
def _migrate_performance_indexes(cursor) -> List[str]:
"""Add performance indexes to attack_detections if missing."""
added = []
if not _index_exists(cursor, "ix_attack_detections_attack_type"):
cursor.execute(
"CREATE INDEX ix_attack_detections_attack_type "
"ON attack_detections(attack_type)"
)
added.append("ix_attack_detections_attack_type")
if not _index_exists(cursor, "ix_attack_detections_type_log"):
cursor.execute(
"CREATE INDEX ix_attack_detections_type_log "
"ON attack_detections(attack_type, access_log_id)"
)
added.append("ix_attack_detections_type_log")
return added
def run_migrations(database_path: str) -> None:
"""
Check the database schema and apply any pending migrations.
Only handles ALTER-level changes (columns, indexes) that
Base.metadata.create_all() cannot apply to existing tables.
Args:
database_path: Path to the SQLite database file.
"""
applied: List[str] = []
try:
conn = sqlite3.connect(database_path)
cursor = conn.cursor()
if _migrate_raw_request_column(cursor):
applied.append("add raw_request column to access_logs")
if _migrate_need_reevaluation_column(cursor):
applied.append("add need_reevaluation column to ip_stats")
ban_cols = _migrate_ban_state_columns(cursor)
for col in ban_cols:
applied.append(f"add {col} column to ip_stats")
idx_added = _migrate_performance_indexes(cursor)
for idx in idx_added:
applied.append(f"add index {idx}")
conn.commit()
conn.close()
except sqlite3.Error as e:
logger.error(f"Migration error: {e}")
if applied:
for m in applied:
logger.info(f"Migration applied: {m}")
logger.info(f"All migrations complete ({len(applied)} applied)")
else:
logger.info("Database schema is up to date — no migrations needed")

View File

@@ -63,6 +63,8 @@ class AccessLog(Base):
timestamp: Mapped[datetime] = mapped_column(
DateTime, nullable=False, default=datetime.utcnow, index=True
)
# Raw HTTP request for forensic analysis (nullable for backward compatibility)
raw_request: Mapped[Optional[str]] = mapped_column(String, nullable=True)
# Relationship to attack detections
attack_detections: Mapped[List["AttackDetection"]] = relationship(
@@ -126,7 +128,7 @@ class AttackDetection(Base):
nullable=False,
index=True,
)
attack_type: Mapped[str] = mapped_column(String(50), nullable=False)
attack_type: Mapped[str] = mapped_column(String(50), nullable=False, index=True)
matched_pattern: Mapped[Optional[str]] = mapped_column(
String(MAX_ATTACK_PATTERN_LENGTH), nullable=True
)
@@ -136,6 +138,11 @@ class AttackDetection(Base):
"AccessLog", back_populates="attack_detections"
)
# Composite index for efficient aggregation queries
__table_args__ = (
Index("ix_attack_detections_type_log", "attack_type", "access_log_id"),
)
def __repr__(self) -> str:
return f"<AttackDetection(id={self.id}, type='{self.attack_type}')>"
@@ -162,12 +169,20 @@ class IpStats(Base):
# GeoIP fields (populated by future enrichment)
country_code: Mapped[Optional[str]] = mapped_column(String(2), nullable=True)
city: Mapped[Optional[str]] = mapped_column(String(MAX_CITY_LENGTH), nullable=True)
country: Mapped[Optional[str]] = mapped_column(String(100), nullable=True)
region: Mapped[Optional[str]] = mapped_column(String(2), nullable=True)
region_name: Mapped[Optional[str]] = mapped_column(String(100), nullable=True)
timezone: Mapped[Optional[str]] = mapped_column(String(50), nullable=True)
isp: Mapped[Optional[str]] = mapped_column(String(100), nullable=True)
reverse: Mapped[Optional[str]] = mapped_column(String(255), nullable=True)
latitude: Mapped[Optional[float]] = mapped_column(Float, nullable=True)
longitude: Mapped[Optional[float]] = mapped_column(Float, nullable=True)
asn: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
asn_org: Mapped[Optional[str]] = mapped_column(
String(MAX_ASN_ORG_LENGTH), nullable=True
)
is_proxy: Mapped[Optional[bool]] = mapped_column(Boolean, nullable=True)
is_hosting: Mapped[Optional[bool]] = mapped_column(Boolean, nullable=True)
list_on: Mapped[Optional[Dict[str, str]]] = mapped_column(JSON, nullable=True)
# Reputation fields (populated by future enrichment)
@@ -185,6 +200,15 @@ class IpStats(Base):
category_scores: Mapped[Dict[str, int]] = mapped_column(JSON, nullable=True)
manual_category: Mapped[bool] = mapped_column(Boolean, default=False, nullable=True)
last_analysis: Mapped[datetime] = mapped_column(DateTime, nullable=True)
need_reevaluation: Mapped[bool] = mapped_column(
Boolean, default=False, nullable=True
)
# Ban/rate-limit state (moved from in-memory tracker to DB)
page_visit_count: Mapped[int] = mapped_column(Integer, default=0, nullable=True)
ban_timestamp: Mapped[Optional[datetime]] = mapped_column(DateTime, nullable=True)
total_violations: Mapped[int] = mapped_column(Integer, default=0, nullable=True)
ban_multiplier: Mapped[int] = mapped_column(Integer, default=1, nullable=True)
def __repr__(self) -> str:
return f"<IpStats(ip='{self.ip}', total_requests={self.total_requests})>"

5
src/routes/__init__.py Normal file
View File

@@ -0,0 +1,5 @@
#!/usr/bin/env python3
"""
FastAPI routes package for the Krawl honeypot.
"""

319
src/routes/api.py Normal file
View File

@@ -0,0 +1,319 @@
#!/usr/bin/env python3
"""
Dashboard JSON API routes.
Migrated from handler.py dashboard API endpoints.
All endpoints are prefixed with the secret dashboard path.
"""
import os
from fastapi import APIRouter, Request, Response, Query
from fastapi.responses import JSONResponse, PlainTextResponse
from dependencies import get_db
from logger import get_app_logger
router = APIRouter()
def _no_cache_headers() -> dict:
return {
"Cache-Control": "no-store, no-cache, must-revalidate, max-age=0",
"Pragma": "no-cache",
"Expires": "0",
"Access-Control-Allow-Origin": "*",
}
@router.get("/api/all-ip-stats")
async def all_ip_stats(request: Request):
db = get_db()
try:
ip_stats_list = db.get_ip_stats(limit=500)
return JSONResponse(
content={"ips": ip_stats_list},
headers=_no_cache_headers(),
)
except Exception as e:
get_app_logger().error(f"Error fetching all IP stats: {e}")
return JSONResponse(content={"error": str(e)}, headers=_no_cache_headers())
@router.get("/api/attackers")
async def attackers(
request: Request,
page: int = Query(1),
page_size: int = Query(25),
sort_by: str = Query("total_requests"),
sort_order: str = Query("desc"),
):
db = get_db()
page = max(1, page)
page_size = min(max(1, page_size), 100)
try:
result = db.get_attackers_paginated(
page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order
)
return JSONResponse(content=result, headers=_no_cache_headers())
except Exception as e:
get_app_logger().error(f"Error fetching attackers: {e}")
return JSONResponse(content={"error": str(e)}, headers=_no_cache_headers())
@router.get("/api/all-ips")
async def all_ips(
request: Request,
page: int = Query(1),
page_size: int = Query(25),
sort_by: str = Query("total_requests"),
sort_order: str = Query("desc"),
):
db = get_db()
page = max(1, page)
page_size = min(max(1, page_size), 10000)
try:
result = db.get_all_ips_paginated(
page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order
)
return JSONResponse(content=result, headers=_no_cache_headers())
except Exception as e:
get_app_logger().error(f"Error fetching all IPs: {e}")
return JSONResponse(content={"error": str(e)}, headers=_no_cache_headers())
@router.get("/api/ip-stats/{ip_address:path}")
async def ip_stats(ip_address: str, request: Request):
db = get_db()
try:
stats = db.get_ip_stats_by_ip(ip_address)
if stats:
return JSONResponse(content=stats, headers=_no_cache_headers())
else:
return JSONResponse(
content={"error": "IP not found"}, headers=_no_cache_headers()
)
except Exception as e:
get_app_logger().error(f"Error fetching IP stats: {e}")
return JSONResponse(content={"error": str(e)}, headers=_no_cache_headers())
@router.get("/api/honeypot")
async def honeypot(
request: Request,
page: int = Query(1),
page_size: int = Query(5),
sort_by: str = Query("count"),
sort_order: str = Query("desc"),
):
db = get_db()
page = max(1, page)
page_size = min(max(1, page_size), 100)
try:
result = db.get_honeypot_paginated(
page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order
)
return JSONResponse(content=result, headers=_no_cache_headers())
except Exception as e:
get_app_logger().error(f"Error fetching honeypot data: {e}")
return JSONResponse(content={"error": str(e)}, headers=_no_cache_headers())
@router.get("/api/credentials")
async def credentials(
request: Request,
page: int = Query(1),
page_size: int = Query(5),
sort_by: str = Query("timestamp"),
sort_order: str = Query("desc"),
):
db = get_db()
page = max(1, page)
page_size = min(max(1, page_size), 100)
try:
result = db.get_credentials_paginated(
page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order
)
return JSONResponse(content=result, headers=_no_cache_headers())
except Exception as e:
get_app_logger().error(f"Error fetching credentials: {e}")
return JSONResponse(content={"error": str(e)}, headers=_no_cache_headers())
@router.get("/api/top-ips")
async def top_ips(
request: Request,
page: int = Query(1),
page_size: int = Query(5),
sort_by: str = Query("count"),
sort_order: str = Query("desc"),
):
db = get_db()
page = max(1, page)
page_size = min(max(1, page_size), 100)
try:
result = db.get_top_ips_paginated(
page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order
)
return JSONResponse(content=result, headers=_no_cache_headers())
except Exception as e:
get_app_logger().error(f"Error fetching top IPs: {e}")
return JSONResponse(content={"error": str(e)}, headers=_no_cache_headers())
@router.get("/api/top-paths")
async def top_paths(
request: Request,
page: int = Query(1),
page_size: int = Query(5),
sort_by: str = Query("count"),
sort_order: str = Query("desc"),
):
db = get_db()
page = max(1, page)
page_size = min(max(1, page_size), 100)
try:
result = db.get_top_paths_paginated(
page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order
)
return JSONResponse(content=result, headers=_no_cache_headers())
except Exception as e:
get_app_logger().error(f"Error fetching top paths: {e}")
return JSONResponse(content={"error": str(e)}, headers=_no_cache_headers())
@router.get("/api/top-user-agents")
async def top_user_agents(
request: Request,
page: int = Query(1),
page_size: int = Query(5),
sort_by: str = Query("count"),
sort_order: str = Query("desc"),
):
db = get_db()
page = max(1, page)
page_size = min(max(1, page_size), 100)
try:
result = db.get_top_user_agents_paginated(
page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order
)
return JSONResponse(content=result, headers=_no_cache_headers())
except Exception as e:
get_app_logger().error(f"Error fetching top user agents: {e}")
return JSONResponse(content={"error": str(e)}, headers=_no_cache_headers())
@router.get("/api/attack-types-stats")
async def attack_types_stats(
request: Request,
limit: int = Query(20),
ip_filter: str = Query(None),
):
db = get_db()
limit = min(max(1, limit), 100)
try:
result = db.get_attack_types_stats(limit=limit, ip_filter=ip_filter)
return JSONResponse(content=result, headers=_no_cache_headers())
except Exception as e:
get_app_logger().error(f"Error fetching attack types stats: {e}")
return JSONResponse(content={"error": str(e)}, headers=_no_cache_headers())
@router.get("/api/attack-types")
async def attack_types(
request: Request,
page: int = Query(1),
page_size: int = Query(5),
sort_by: str = Query("timestamp"),
sort_order: str = Query("desc"),
):
db = get_db()
page = max(1, page)
page_size = min(max(1, page_size), 100)
try:
result = db.get_attack_types_paginated(
page=page, page_size=page_size, sort_by=sort_by, sort_order=sort_order
)
return JSONResponse(content=result, headers=_no_cache_headers())
except Exception as e:
get_app_logger().error(f"Error fetching attack types: {e}")
return JSONResponse(content={"error": str(e)}, headers=_no_cache_headers())
@router.get("/api/raw-request/{log_id:int}")
async def raw_request(log_id: int, request: Request):
db = get_db()
try:
raw = db.get_raw_request_by_id(log_id)
if raw is None:
return JSONResponse(
content={"error": "Raw request not found"}, status_code=404
)
return JSONResponse(content={"raw_request": raw}, headers=_no_cache_headers())
except Exception as e:
get_app_logger().error(f"Error fetching raw request: {e}")
return JSONResponse(content={"error": str(e)}, status_code=500)
@router.get("/api/get_banlist")
async def get_banlist(request: Request, fwtype: str = Query("iptables")):
config = request.app.state.config
filename = f"{fwtype}_banlist.txt"
if fwtype == "raw":
filename = "malicious_ips.txt"
file_path = os.path.join(config.exports_path, filename)
try:
if os.path.exists(file_path):
with open(file_path, "rb") as f:
content = f.read()
return Response(
content=content,
status_code=200,
media_type="text/plain",
headers={
"Content-Disposition": f'attachment; filename="{filename}"',
"Content-Length": str(len(content)),
},
)
else:
return PlainTextResponse("File not found", status_code=404)
except Exception as e:
get_app_logger().error(f"Error serving malicious IPs file: {e}")
return PlainTextResponse("Internal server error", status_code=500)
@router.get("/api/download/malicious_ips.txt")
async def download_malicious_ips(request: Request):
config = request.app.state.config
file_path = os.path.join(config.exports_path, "malicious_ips.txt")
try:
if os.path.exists(file_path):
with open(file_path, "rb") as f:
content = f.read()
return Response(
content=content,
status_code=200,
media_type="text/plain",
headers={
"Content-Disposition": 'attachment; filename="malicious_ips.txt"',
"Content-Length": str(len(content)),
},
)
else:
return PlainTextResponse("File not found", status_code=404)
except Exception as e:
get_app_logger().error(f"Error serving malicious IPs file: {e}")
return PlainTextResponse("Internal server error", status_code=500)

74
src/routes/dashboard.py Normal file
View File

@@ -0,0 +1,74 @@
#!/usr/bin/env python3
"""
Dashboard page route.
Renders the main dashboard page with server-side data for initial load.
"""
from fastapi import APIRouter, Request
from fastapi.responses import JSONResponse
from logger import get_app_logger
from dependencies import get_db, get_templates
router = APIRouter()
@router.get("")
@router.get("/")
async def dashboard_page(request: Request):
db = get_db()
config = request.app.state.config
dashboard_path = "/" + config.dashboard_secret_path.lstrip("/")
# Get initial data for server-rendered sections
stats = db.get_dashboard_counts()
suspicious = db.get_recent_suspicious(limit=10)
# Get credential count for the stats card
cred_result = db.get_credentials_paginated(page=1, page_size=1)
stats["credential_count"] = cred_result["pagination"]["total"]
templates = get_templates()
return templates.TemplateResponse(
"dashboard/index.html",
{
"request": request,
"dashboard_path": dashboard_path,
"stats": stats,
"suspicious_activities": suspicious,
},
)
@router.get("/ip/{ip_address:path}")
async def ip_page(ip_address: str, request: Request):
db = get_db()
try:
stats = db.get_ip_stats_by_ip(ip_address)
config = request.app.state.config
dashboard_path = "/" + config.dashboard_secret_path.lstrip("/")
if stats:
# Transform fields for template compatibility
list_on = stats.get("list_on") or {}
stats["blocklist_memberships"] = list(list_on.keys()) if list_on else []
stats["reverse_dns"] = stats.get("reverse")
templates = get_templates()
return templates.TemplateResponse(
"dashboard/ip.html",
{
"request": request,
"dashboard_path": dashboard_path,
"stats": stats,
"ip_address": ip_address,
},
)
else:
return JSONResponse(
content={"error": "IP not found"},
)
except Exception as e:
get_app_logger().error(f"Error fetching IP stats: {e}")
return JSONResponse(content={"error": str(e)})

500
src/routes/honeypot.py Normal file
View File

@@ -0,0 +1,500 @@
#!/usr/bin/env python3
"""
Honeypot trap routes for the Krawl deception server.
Migrated from handler.py serve_special_path(), do_POST(), and do_GET() catch-all.
"""
import asyncio
import random
import time
from datetime import datetime
from urllib.parse import urlparse, parse_qs, unquote_plus
from fastapi import APIRouter, Request, Response, Depends
from fastapi.responses import HTMLResponse, PlainTextResponse, JSONResponse
from dependencies import (
get_tracker,
get_app_config,
get_client_ip,
build_raw_request,
)
from config import Config
from tracker import AccessTracker
from templates import html_templates
from generators import (
credentials_txt,
passwords_txt,
users_json,
api_keys_json,
api_response,
directory_listing,
)
from deception_responses import (
generate_sql_error_response,
get_sql_response_with_data,
detect_xss_pattern,
generate_xss_response,
generate_server_error,
)
from wordlists import get_wordlists
from logger import get_app_logger, get_access_logger, get_credential_logger
# --- Auto-tracking dependency ---
# Records requests that match attack patterns or honeypot trap paths.
async def _track_honeypot_request(request: Request):
"""Record access for requests with attack patterns or honeypot path hits."""
tracker = request.app.state.tracker
client_ip = get_client_ip(request)
user_agent = request.headers.get("User-Agent", "")
path = request.url.path
body = ""
if request.method in ("POST", "PUT"):
body_bytes = await request.body()
body = body_bytes.decode("utf-8", errors="replace")
# Check attack patterns in path and body
attack_findings = tracker.detect_attack_type(path)
if body:
import urllib.parse
decoded_body = urllib.parse.unquote(body)
attack_findings.extend(tracker.detect_attack_type(decoded_body))
# Record if attack pattern detected OR path is a honeypot trap
if attack_findings or tracker.is_honeypot_path(path):
tracker.record_access(
ip=client_ip,
path=path,
user_agent=user_agent,
body=body,
method=request.method,
raw_request=build_raw_request(request, body),
)
router = APIRouter(dependencies=[Depends(_track_honeypot_request)])
# --- Helper functions ---
def _should_return_error(config: Config) -> bool:
if config.probability_error_codes <= 0:
return False
return random.randint(1, 100) <= config.probability_error_codes
def _get_random_error_code() -> int:
wl = get_wordlists()
error_codes = wl.error_codes
if not error_codes:
error_codes = [400, 401, 403, 404, 500, 502, 503]
return random.choice(error_codes)
# --- HEAD ---
@router.head("/{path:path}")
async def handle_head(path: str):
return Response(status_code=200, headers={"Content-Type": "text/html"})
# --- POST routes ---
@router.post("/api/search")
@router.post("/api/sql")
@router.post("/api/database")
async def sql_endpoint_post(request: Request):
client_ip = get_client_ip(request)
access_logger = get_access_logger()
body_bytes = await request.body()
post_data = body_bytes.decode("utf-8", errors="replace")
base_path = request.url.path
access_logger.info(
f"[SQL ENDPOINT POST] {client_ip} - {base_path} - Data: {post_data[:100] if post_data else 'empty'}"
)
error_msg, content_type, status_code = generate_sql_error_response(post_data)
if error_msg:
access_logger.warning(
f"[SQL INJECTION DETECTED POST] {client_ip} - {base_path}"
)
return Response(
content=error_msg, status_code=status_code, media_type=content_type
)
else:
response_data = get_sql_response_with_data(base_path, post_data)
return Response(
content=response_data, status_code=200, media_type="application/json"
)
@router.post("/api/contact")
async def contact_post(request: Request):
client_ip = get_client_ip(request)
user_agent = request.headers.get("User-Agent", "")
tracker = request.app.state.tracker
access_logger = get_access_logger()
app_logger = get_app_logger()
body_bytes = await request.body()
post_data = body_bytes.decode("utf-8", errors="replace")
parsed_data = {}
if post_data:
parsed_qs = parse_qs(post_data)
parsed_data = {k: v[0] if v else "" for k, v in parsed_qs.items()}
xss_detected = any(detect_xss_pattern(str(v)) for v in parsed_data.values())
if xss_detected:
access_logger.warning(
f"[XSS ATTEMPT DETECTED] {client_ip} - {request.url.path} - Data: {post_data[:200]}"
)
else:
access_logger.info(f"[XSS ENDPOINT POST] {client_ip} - {request.url.path}")
response_html = generate_xss_response(parsed_data)
return HTMLResponse(content=response_html, status_code=200)
@router.post("/{path:path}")
async def credential_capture_post(request: Request, path: str):
"""Catch-all POST handler for credential capture."""
client_ip = get_client_ip(request)
user_agent = request.headers.get("User-Agent", "")
tracker = request.app.state.tracker
access_logger = get_access_logger()
credential_logger = get_credential_logger()
body_bytes = await request.body()
post_data = body_bytes.decode("utf-8", errors="replace")
full_path = f"/{path}"
access_logger.warning(
f"[LOGIN ATTEMPT] {client_ip} - {full_path} - {user_agent[:50]}"
)
if post_data:
access_logger.warning(f"[POST DATA] {post_data[:200]}")
username, password = tracker.parse_credentials(post_data)
if username or password:
timestamp = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
credential_line = f"{timestamp}|{client_ip}|{username or 'N/A'}|{password or 'N/A'}|{full_path}"
credential_logger.info(credential_line)
tracker.record_credential_attempt(
client_ip, full_path, username or "N/A", password or "N/A"
)
access_logger.warning(
f"[CREDENTIALS CAPTURED] {client_ip} - Username: {username or 'N/A'} - Path: {full_path}"
)
await asyncio.sleep(1)
return HTMLResponse(content=html_templates.login_error(), status_code=200)
# --- GET special paths ---
@router.get("/robots.txt")
async def robots_txt():
return PlainTextResponse(html_templates.robots_txt())
@router.get("/credentials.txt")
async def fake_credentials():
return PlainTextResponse(credentials_txt())
@router.get("/passwords.txt")
@router.get("/admin_notes.txt")
async def fake_passwords():
return PlainTextResponse(passwords_txt())
@router.get("/users.json")
async def fake_users_json():
return JSONResponse(content=None, status_code=200, media_type="application/json")
@router.get("/api_keys.json")
async def fake_api_keys():
return Response(
content=api_keys_json(), status_code=200, media_type="application/json"
)
@router.get("/config.json")
async def fake_config_json():
return Response(
content=api_response("/api/config"),
status_code=200,
media_type="application/json",
)
# Override the generic /users.json to return actual content
@router.get("/users.json", include_in_schema=False)
async def fake_users_json_content():
return Response(
content=users_json(), status_code=200, media_type="application/json"
)
@router.get("/admin")
@router.get("/admin/")
@router.get("/admin/login")
@router.get("/login")
async def fake_login():
return HTMLResponse(html_templates.login_form())
@router.get("/users")
@router.get("/user")
@router.get("/database")
@router.get("/db")
@router.get("/search")
async def fake_product_search():
return HTMLResponse(html_templates.product_search())
@router.get("/info")
@router.get("/input")
@router.get("/contact")
@router.get("/feedback")
@router.get("/comment")
async def fake_input_form():
return HTMLResponse(html_templates.input_form())
@router.get("/server")
async def fake_server_error():
error_html, content_type = generate_server_error()
return Response(content=error_html, status_code=500, media_type=content_type)
@router.get("/wp-login.php")
@router.get("/wp-login")
@router.get("/wp-admin")
@router.get("/wp-admin/")
async def fake_wp_login():
return HTMLResponse(html_templates.wp_login())
@router.get("/wp-content/{path:path}")
@router.get("/wp-includes/{path:path}")
async def fake_wordpress(path: str = ""):
return HTMLResponse(html_templates.wordpress())
@router.get("/phpmyadmin")
@router.get("/phpmyadmin/{path:path}")
@router.get("/phpMyAdmin")
@router.get("/phpMyAdmin/{path:path}")
@router.get("/pma")
@router.get("/pma/")
async def fake_phpmyadmin(path: str = ""):
return HTMLResponse(html_templates.phpmyadmin())
@router.get("/.env")
async def fake_env():
return Response(
content=api_response("/.env"), status_code=200, media_type="application/json"
)
@router.get("/backup/")
@router.get("/uploads/")
@router.get("/private/")
@router.get("/config/")
@router.get("/database/")
async def fake_directory_listing(request: Request):
return HTMLResponse(directory_listing(request.url.path))
# --- SQL injection honeypot GET endpoints ---
@router.get("/api/search")
@router.get("/api/sql")
@router.get("/api/database")
async def sql_endpoint_get(request: Request):
client_ip = get_client_ip(request)
access_logger = get_access_logger()
app_logger = get_app_logger()
base_path = request.url.path
request_query = request.url.query or ""
error_msg, content_type, status_code = generate_sql_error_response(request_query)
if error_msg:
access_logger.warning(
f"[SQL INJECTION DETECTED] {client_ip} - {base_path} - Query: {request_query[:100] if request_query else 'empty'}"
)
return Response(
content=error_msg, status_code=status_code, media_type=content_type
)
else:
access_logger.info(
f"[SQL ENDPOINT] {client_ip} - {base_path} - Query: {request_query[:100] if request_query else 'empty'}"
)
response_data = get_sql_response_with_data(base_path, request_query)
return Response(
content=response_data, status_code=200, media_type="application/json"
)
# --- Generic /api/* fake endpoints ---
@router.get("/api/{path:path}")
async def fake_api_catchall(request: Request, path: str):
full_path = f"/api/{path}"
return Response(
content=api_response(full_path), status_code=200, media_type="application/json"
)
# --- Catch-all GET (trap pages with random links) ---
# This MUST be registered last in the router
@router.get("/{path:path}")
async def trap_page(request: Request, path: str):
"""Generate trap page with random links. This is the catch-all route."""
config = request.app.state.config
tracker = request.app.state.tracker
app_logger = get_app_logger()
access_logger = get_access_logger()
client_ip = get_client_ip(request)
user_agent = request.headers.get("User-Agent", "")
full_path = f"/{path}" if path else "/"
# Check wordpress-like paths
if "wordpress" in full_path.lower():
return HTMLResponse(html_templates.wordpress())
is_suspicious = tracker.is_suspicious_user_agent(user_agent)
if is_suspicious:
access_logger.warning(
f"[SUSPICIOUS] {client_ip} - {user_agent[:50]} - {full_path}"
)
else:
access_logger.info(f"[REQUEST] {client_ip} - {full_path}")
# Record access unless the router dependency already handled it
# (attack pattern or honeypot path → already recorded by _track_honeypot_request)
if not tracker.detect_attack_type(full_path) and not tracker.is_honeypot_path(
full_path
):
tracker.record_access(
ip=client_ip,
path=full_path,
user_agent=user_agent,
method=request.method,
raw_request=build_raw_request(request) if is_suspicious else "",
)
# Random error response
if _should_return_error(config):
error_code = _get_random_error_code()
access_logger.info(f"Returning error {error_code} to {client_ip} - {full_path}")
return Response(status_code=error_code)
# Response delay
await asyncio.sleep(config.delay / 1000.0)
# Increment page visit counter
current_visit_count = tracker.increment_page_visit(client_ip)
# Generate page
page_html = _generate_page(
config, tracker, client_ip, full_path, current_visit_count, request.app
)
# Decrement canary counter
request.app.state.counter -= 1
if request.app.state.counter < 0:
request.app.state.counter = config.canary_token_tries
return HTMLResponse(content=page_html, status_code=200)
def _generate_page(config, tracker, client_ip, seed, page_visit_count, app) -> str:
"""Generate a webpage containing random links or canary token."""
random.seed(seed)
ip_category = tracker.get_category_by_ip(client_ip)
should_apply_crawler_limit = False
if config.infinite_pages_for_malicious:
if (
ip_category == "good_crawler" or ip_category == "regular_user"
) and page_visit_count >= config.max_pages_limit:
should_apply_crawler_limit = True
else:
if (
ip_category == "good_crawler"
or ip_category == "bad_crawler"
or ip_category == "attacker"
) and page_visit_count >= config.max_pages_limit:
should_apply_crawler_limit = True
if should_apply_crawler_limit:
return html_templates.main_page(
app.state.counter, "<p>Crawl limit reached.</p>"
)
num_pages = random.randint(*config.links_per_page_range)
content = ""
if app.state.counter <= 0 and config.canary_token_url:
content += f"""
<div class="link-box canary-token">
<a href="{config.canary_token_url}">{config.canary_token_url}</a>
</div>
"""
webpages = app.state.webpages
if webpages is None:
for _ in range(num_pages):
address = "".join(
[
random.choice(config.char_space)
for _ in range(random.randint(*config.links_length_range))
]
)
content += f"""
<div class="link-box">
<a href="{address}">{address}</a>
</div>
"""
else:
for _ in range(num_pages):
address = random.choice(webpages)
content += f"""
<div class="link-box">
<a href="{address}">{address}</a>
</div>
"""
return html_templates.main_page(app.state.counter, content)

407
src/routes/htmx.py Normal file
View File

@@ -0,0 +1,407 @@
#!/usr/bin/env python3
"""
HTMX fragment endpoints.
Server-rendered HTML partials for table pagination, sorting, IP details, and search.
"""
from fastapi import APIRouter, Request, Response, Query
from dependencies import get_db, get_templates
router = APIRouter()
def _dashboard_path(request: Request) -> str:
config = request.app.state.config
return "/" + config.dashboard_secret_path.lstrip("/")
# ── Honeypot Triggers ────────────────────────────────────────────────
@router.get("/htmx/honeypot")
async def htmx_honeypot(
request: Request,
page: int = Query(1),
sort_by: str = Query("count"),
sort_order: str = Query("desc"),
):
db = get_db()
result = db.get_honeypot_paginated(
page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
)
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/honeypot_table.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"items": result["honeypots"],
"pagination": result["pagination"],
"sort_by": sort_by,
"sort_order": sort_order,
},
)
# ── Top IPs ──────────────────────────────────────────────────────────
@router.get("/htmx/top-ips")
async def htmx_top_ips(
request: Request,
page: int = Query(1),
sort_by: str = Query("count"),
sort_order: str = Query("desc"),
):
db = get_db()
result = db.get_top_ips_paginated(
page=max(1, page), page_size=8, sort_by=sort_by, sort_order=sort_order
)
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/top_ips_table.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"items": result["ips"],
"pagination": result["pagination"],
"sort_by": sort_by,
"sort_order": sort_order,
},
)
# ── Top Paths ────────────────────────────────────────────────────────
@router.get("/htmx/top-paths")
async def htmx_top_paths(
request: Request,
page: int = Query(1),
sort_by: str = Query("count"),
sort_order: str = Query("desc"),
):
db = get_db()
result = db.get_top_paths_paginated(
page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
)
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/top_paths_table.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"items": result["paths"],
"pagination": result["pagination"],
"sort_by": sort_by,
"sort_order": sort_order,
},
)
# ── Top User-Agents ─────────────────────────────────────────────────
@router.get("/htmx/top-ua")
async def htmx_top_ua(
request: Request,
page: int = Query(1),
sort_by: str = Query("count"),
sort_order: str = Query("desc"),
):
db = get_db()
result = db.get_top_user_agents_paginated(
page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
)
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/top_ua_table.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"items": result["user_agents"],
"pagination": result["pagination"],
"sort_by": sort_by,
"sort_order": sort_order,
},
)
# ── Attackers ────────────────────────────────────────────────────────
@router.get("/htmx/attackers")
async def htmx_attackers(
request: Request,
page: int = Query(1),
sort_by: str = Query("total_requests"),
sort_order: str = Query("desc"),
):
db = get_db()
result = db.get_attackers_paginated(
page=max(1, page), page_size=25, sort_by=sort_by, sort_order=sort_order
)
# Normalize pagination key (DB returns total_attackers, template expects total)
pagination = result["pagination"]
if "total_attackers" in pagination and "total" not in pagination:
pagination["total"] = pagination["total_attackers"]
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/attackers_table.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"items": result["attackers"],
"pagination": pagination,
"sort_by": sort_by,
"sort_order": sort_order,
},
)
# ── Access logs by ip ────────────────────────────────────────────────────────
@router.get("/htmx/access-logs")
async def htmx_access_logs_by_ip(
request: Request,
page: int = Query(1),
sort_by: str = Query("total_requests"),
sort_order: str = Query("desc"),
ip_filter: str = Query("ip_filter"),
):
db = get_db()
result = db.get_access_logs_paginated(
page=max(1, page), page_size=25, ip_filter=ip_filter
)
# Normalize pagination key (DB returns total_attackers, template expects total)
pagination = result["pagination"]
if "total_access_logs" in pagination and "total" not in pagination:
pagination["total"] = pagination["total_access_logs"]
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/access_by_ip_table.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"items": result["access_logs"],
"pagination": pagination,
"sort_by": sort_by,
"sort_order": sort_order,
"ip_filter": ip_filter,
},
)
# ── Credentials ──────────────────────────────────────────────────────
@router.get("/htmx/credentials")
async def htmx_credentials(
request: Request,
page: int = Query(1),
sort_by: str = Query("timestamp"),
sort_order: str = Query("desc"),
):
db = get_db()
result = db.get_credentials_paginated(
page=max(1, page), page_size=5, sort_by=sort_by, sort_order=sort_order
)
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/credentials_table.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"items": result["credentials"],
"pagination": result["pagination"],
"sort_by": sort_by,
"sort_order": sort_order,
},
)
# ── Attack Types ─────────────────────────────────────────────────────
@router.get("/htmx/attacks")
async def htmx_attacks(
request: Request,
page: int = Query(1),
sort_by: str = Query("timestamp"),
sort_order: str = Query("desc"),
ip_filter: str = Query(None),
):
db = get_db()
result = db.get_attack_types_paginated(
page=max(1, page),
page_size=5,
sort_by=sort_by,
sort_order=sort_order,
ip_filter=ip_filter,
)
# Transform attack data for template (join attack_types list, map id to log_id)
items = []
for attack in result["attacks"]:
items.append(
{
"ip": attack["ip"],
"path": attack["path"],
"attack_type": ", ".join(attack.get("attack_types", [])),
"user_agent": attack.get("user_agent", ""),
"timestamp": attack.get("timestamp"),
"log_id": attack.get("id"),
}
)
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/attack_types_table.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"items": items,
"pagination": result["pagination"],
"sort_by": sort_by,
"sort_order": sort_order,
"ip_filter": ip_filter or "",
},
)
# ── Attack Patterns ──────────────────────────────────────────────────
@router.get("/htmx/patterns")
async def htmx_patterns(
request: Request,
page: int = Query(1),
):
db = get_db()
page = max(1, page)
page_size = 10
# Get all attack type stats and paginate manually
result = db.get_attack_types_stats(limit=100)
all_patterns = [
{"pattern": item["type"], "count": item["count"]}
for item in result.get("attack_types", [])
]
total = len(all_patterns)
total_pages = max(1, (total + page_size - 1) // page_size)
offset = (page - 1) * page_size
items = all_patterns[offset : offset + page_size]
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/patterns_table.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"items": items,
"pagination": {
"page": page,
"page_size": page_size,
"total": total,
"total_pages": total_pages,
},
},
)
# ── IP Insight (full IP page as partial) ─────────────────────────────
@router.get("/htmx/ip-insight/{ip_address:path}")
async def htmx_ip_insight(ip_address: str, request: Request):
db = get_db()
stats = db.get_ip_stats_by_ip(ip_address)
if not stats:
stats = {"ip": ip_address, "total_requests": "N/A"}
# Transform fields for template compatibility
list_on = stats.get("list_on") or {}
stats["blocklist_memberships"] = list(list_on.keys()) if list_on else []
stats["reverse_dns"] = stats.get("reverse")
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/ip_insight.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"stats": stats,
"ip_address": ip_address,
},
)
# ── IP Detail ────────────────────────────────────────────────────────
@router.get("/htmx/ip-detail/{ip_address:path}")
async def htmx_ip_detail(ip_address: str, request: Request):
db = get_db()
stats = db.get_ip_stats_by_ip(ip_address)
if not stats:
stats = {"ip": ip_address, "total_requests": "N/A"}
# Transform fields for template compatibility
list_on = stats.get("list_on") or {}
stats["blocklist_memberships"] = list(list_on.keys()) if list_on else []
stats["reverse_dns"] = stats.get("reverse")
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/ip_detail.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"stats": stats,
},
)
# ── Search ───────────────────────────────────────────────────────────
@router.get("/htmx/search")
async def htmx_search(
request: Request,
q: str = Query(""),
page: int = Query(1),
):
q = q.strip()
if not q:
return Response(content="", media_type="text/html")
db = get_db()
result = db.search_attacks_and_ips(query=q, page=max(1, page), page_size=20)
templates = get_templates()
return templates.TemplateResponse(
"dashboard/partials/search_results.html",
{
"request": request,
"dashboard_path": _dashboard_path(request),
"attacks": result["attacks"],
"ips": result["ips"],
"query": q,
"pagination": result["pagination"],
},
)

View File

@@ -1,141 +0,0 @@
#!/usr/bin/env python3
"""
Main server module for the deception honeypot.
Run this file to start the server.
"""
import sys
from http.server import HTTPServer
from config import get_config
from tracker import AccessTracker
from analyzer import Analyzer
from handler import Handler
from logger import (
initialize_logging,
get_app_logger,
get_access_logger,
get_credential_logger,
)
from database import initialize_database
from tasks_master import get_tasksmaster
def print_usage():
"""Print usage information"""
print(f"Usage: {sys.argv[0]} [FILE]\n")
print("FILE is file containing a list of webpage names to serve, one per line.")
print("If no file is provided, random links will be generated.\n")
print("Configuration:")
print(" Configuration is loaded from a YAML file (default: config.yaml)")
print("Set CONFIG_LOCATION environment variable to use a different file.\n")
print("Example config.yaml structure:")
print("server:")
print("port: 5000")
print("delay: 100")
print("links:")
print("min_length: 5")
print("max_length: 15")
print("min_per_page: 10")
print("max_per_page: 15")
print("canary:")
print("token_url: null")
print("token_tries: 10")
print("dashboard:")
print("secret_path: null # auto-generated if not set")
print("database:")
print('path: "data/krawl.db"')
print("retention_days: 30")
print("behavior:")
print("probability_error_codes: 0")
def main():
"""Main entry point for the deception server"""
if "-h" in sys.argv or "--help" in sys.argv:
print_usage()
exit(0)
config = get_config()
# Initialize logging with timezone
initialize_logging()
app_logger = get_app_logger()
access_logger = get_access_logger()
credential_logger = get_credential_logger()
# Initialize database for persistent storage
try:
initialize_database(config.database_path)
app_logger.info(f"Database initialized at: {config.database_path}")
except Exception as e:
app_logger.warning(
f"Database initialization failed: {e}. Continuing with in-memory only."
)
tracker = AccessTracker(config.max_pages_limit, config.ban_duration_seconds)
analyzer = Analyzer()
Handler.config = config
Handler.tracker = tracker
Handler.analyzer = analyzer
Handler.counter = config.canary_token_tries
Handler.app_logger = app_logger
Handler.access_logger = access_logger
Handler.credential_logger = credential_logger
if len(sys.argv) == 2:
try:
with open(sys.argv[1], "r") as f:
Handler.webpages = f.readlines()
if not Handler.webpages:
app_logger.warning(
"The file provided was empty. Using randomly generated links."
)
Handler.webpages = None
except IOError:
app_logger.warning("Can't read input file. Using randomly generated links.")
# tasks master init
tasks_master = get_tasksmaster()
tasks_master.run_scheduled_tasks()
try:
banner = f"""
============================================================
DASHBOARD AVAILABLE AT
{config.dashboard_secret_path}
============================================================
"""
app_logger.info(banner)
app_logger.info(f"Starting deception server on port {config.port}...")
if config.canary_token_url:
app_logger.info(
f"Canary token will appear after {config.canary_token_tries} tries"
)
else:
app_logger.info(
"No canary token configured (set CANARY_TOKEN_URL to enable)"
)
server = HTTPServer(("0.0.0.0", config.port), Handler)
app_logger.info("Server started. Use <Ctrl-C> to stop.")
server.serve_forever()
except KeyboardInterrupt:
app_logger.info("Stopping server...")
server.socket.close()
app_logger.info("Server stopped")
except Exception as e:
app_logger.error(f"Error starting HTTP server on port {config.port}: {e}")
app_logger.error(
f"Make sure you are root, if needed, and that port {config.port} is open."
)
exit(1)
if __name__ == "__main__":
main()

View File

@@ -1,65 +0,0 @@
#!/usr/bin/env python3
import random
from wordlists import get_wordlists
def generate_server_error() -> tuple[str, str]:
wl = get_wordlists()
server_errors = wl.server_errors
if not server_errors:
return ("500 Internal Server Error", "text/html")
server_type = random.choice(list(server_errors.keys()))
server_config = server_errors[server_type]
error_codes = {
400: "Bad Request",
401: "Unauthorized",
403: "Forbidden",
404: "Not Found",
500: "Internal Server Error",
502: "Bad Gateway",
503: "Service Unavailable",
}
code = random.choice(list(error_codes.keys()))
message = error_codes[code]
template = server_config.get("template", "")
version = random.choice(server_config.get("versions", ["1.0"]))
html = template.replace("{code}", str(code))
html = html.replace("{message}", message)
html = html.replace("{version}", version)
if server_type == "apache":
os = random.choice(server_config.get("os", ["Ubuntu"]))
html = html.replace("{os}", os)
html = html.replace("{host}", "localhost")
return (html, "text/html")
def get_server_header(server_type: str = None) -> str:
wl = get_wordlists()
server_errors = wl.server_errors
if not server_errors:
return "nginx/1.18.0"
if not server_type:
server_type = random.choice(list(server_errors.keys()))
server_config = server_errors.get(server_type, {})
version = random.choice(server_config.get("versions", ["1.0"]))
server_headers = {
"nginx": f"nginx/{version}",
"apache": f"Apache/{version}",
"iis": f"Microsoft-IIS/{version}",
"tomcat": f"Apache-Coyote/1.1",
}
return server_headers.get(server_type, "nginx/1.18.0")

View File

@@ -1,115 +0,0 @@
#!/usr/bin/env python3
import random
import re
from typing import Optional, Tuple
from wordlists import get_wordlists
def detect_sql_injection_pattern(query_string: str) -> Optional[str]:
if not query_string:
return None
query_lower = query_string.lower()
patterns = {
"quote": [r"'", r'"', r"`"],
"comment": [r"--", r"#", r"/\*", r"\*/"],
"union": [r"\bunion\b", r"\bunion\s+select\b"],
"boolean": [r"\bor\b.*=.*", r"\band\b.*=.*", r"'.*or.*'.*=.*'"],
"time_based": [r"\bsleep\b", r"\bwaitfor\b", r"\bdelay\b", r"\bbenchmark\b"],
"stacked": [r";.*select", r";.*drop", r";.*insert", r";.*update", r";.*delete"],
"command": [r"\bexec\b", r"\bexecute\b", r"\bxp_cmdshell\b"],
"info_schema": [r"information_schema", r"table_schema", r"table_name"],
}
for injection_type, pattern_list in patterns.items():
for pattern in pattern_list:
if re.search(pattern, query_lower):
return injection_type
return None
def get_random_sql_error(
db_type: str = None, injection_type: str = None
) -> Tuple[str, str]:
wl = get_wordlists()
sql_errors = wl.sql_errors
if not sql_errors:
return ("Database error occurred", "text/plain")
if not db_type:
db_type = random.choice(list(sql_errors.keys()))
db_errors = sql_errors.get(db_type, {})
if injection_type and injection_type in db_errors:
errors = db_errors[injection_type]
elif "generic" in db_errors:
errors = db_errors["generic"]
else:
all_errors = []
for error_list in db_errors.values():
if isinstance(error_list, list):
all_errors.extend(error_list)
errors = all_errors if all_errors else ["Database error occurred"]
error_message = random.choice(errors) if errors else "Database error occurred"
if "{table}" in error_message:
tables = ["users", "products", "orders", "customers", "accounts", "sessions"]
error_message = error_message.replace("{table}", random.choice(tables))
if "{column}" in error_message:
columns = ["id", "name", "email", "password", "username", "created_at"]
error_message = error_message.replace("{column}", random.choice(columns))
return (error_message, "text/plain")
def generate_sql_error_response(
query_string: str, db_type: str = None
) -> Tuple[str, str, int]:
injection_type = detect_sql_injection_pattern(query_string)
if not injection_type:
return (None, None, None)
error_message, content_type = get_random_sql_error(db_type, injection_type)
status_code = 500
if random.random() < 0.3:
status_code = 200
return (error_message, content_type, status_code)
def get_sql_response_with_data(path: str, params: str) -> str:
import json
from generators import random_username, random_email, random_password
injection_type = detect_sql_injection_pattern(params)
if injection_type in ["union", "boolean", "stacked"]:
data = {
"success": True,
"results": [
{
"id": i,
"username": random_username(),
"email": random_email(),
"password_hash": random_password(),
"role": random.choice(["admin", "user", "moderator"]),
}
for i in range(1, random.randint(2, 5))
],
}
return json.dumps(data, indent=2)
return json.dumps(
{"success": True, "message": "Query executed successfully", "results": []},
indent=2,
)

View File

@@ -1,7 +1,5 @@
from sqlalchemy import select
from typing import Optional
from database import get_database, DatabaseManager
from zoneinfo import ZoneInfo
from collections import Counter
from database import get_database
from pathlib import Path
from datetime import datetime, timedelta
import re
@@ -9,8 +7,6 @@ import urllib.parse
from wordlists import get_wordlists
from config import get_config
from logger import get_app_logger
import requests
from sanitizer import sanitize_for_storage, sanitize_dict
# ----------------------
# TASK CONFIG
@@ -74,7 +70,7 @@ def main():
"risky_http_methods": 6,
"robots_violations": 4,
"uneven_request_timing": 3,
"different_user_agents": 8,
"different_user_agents": 2,
"attack_url": 15,
},
"good_crawler": {
@@ -88,7 +84,7 @@ def main():
"risky_http_methods": 2,
"robots_violations": 7,
"uneven_request_timing": 0,
"different_user_agents": 5,
"different_user_agents": 7,
"attack_url": 5,
},
"regular_user": {
@@ -99,67 +95,45 @@ def main():
"attack_url": 0,
},
}
# Get IPs with recent activity (last minute to match cron schedule)
recent_accesses = db_manager.get_access_logs(limit=999999999, since_minutes=1)
ips_to_analyze = {item["ip"] for item in recent_accesses}
# Parse robots.txt once before the loop (it never changes during a run)
robots_disallows = []
robots_path = Path(__file__).parent.parent / "templates" / "html" / "robots.txt"
with open(robots_path, "r") as f:
for line in f:
line = line.strip()
if not line:
continue
parts = line.split(":")
if parts[0] == "Disallow":
parts[1] = parts[1].rstrip("/")
robots_disallows.append(parts[1].strip())
# Get IPs flagged for reevaluation (set when a suspicious request arrives)
ips_to_analyze = set(db_manager.get_ips_needing_reevaluation())
if not ips_to_analyze:
app_logger.debug("[Background Task] analyze-ips: No recent activity, skipping")
app_logger.debug(
"[Background Task] analyze-ips: No IPs need reevaluation, skipping"
)
return
for ip in ips_to_analyze:
# Get full history for this IP to perform accurate analysis
ip_accesses = db_manager.get_access_logs(limit=999999999, ip_filter=ip)
ip_accesses = db_manager.get_access_logs(
limit=10000, ip_filter=ip, since_minutes=1440 * 30
) # look back up to 30 days of history for better accuracy
total_accesses_count = len(ip_accesses)
if total_accesses_count <= 0:
return
continue
# Set category as "unknown" for the first 3 requests
if total_accesses_count < 3:
category = "unknown"
analyzed_metrics = {}
category_scores = {
"attacker": 0,
"good_crawler": 0,
"bad_crawler": 0,
"regular_user": 0,
"unknown": 0,
}
last_analysis = datetime.now()
db_manager.update_ip_stats_analysis(
ip, analyzed_metrics, category, category_scores, last_analysis
)
return 0
# --------------------- HTTP Methods ---------------------
get_accesses_count = len(
[item for item in ip_accesses if item["method"] == "GET"]
)
post_accesses_count = len(
[item for item in ip_accesses if item["method"] == "POST"]
)
put_accesses_count = len(
[item for item in ip_accesses if item["method"] == "PUT"]
)
delete_accesses_count = len(
[item for item in ip_accesses if item["method"] == "DELETE"]
)
head_accesses_count = len(
[item for item in ip_accesses if item["method"] == "HEAD"]
)
options_accesses_count = len(
[item for item in ip_accesses if item["method"] == "OPTIONS"]
)
patch_accesses_count = len(
[item for item in ip_accesses if item["method"] == "PATCH"]
)
method_counts = Counter(item["method"] for item in ip_accesses)
if total_accesses_count > http_risky_methods_threshold:
http_method_attacker_score = (
post_accesses_count
+ put_accesses_count
+ delete_accesses_count
+ options_accesses_count
+ patch_accesses_count
) / total_accesses_count
risky_count = sum(
method_counts.get(m, 0)
for m in ("POST", "PUT", "DELETE", "OPTIONS", "PATCH")
)
http_method_attacker_score = risky_count / total_accesses_count
else:
http_method_attacker_score = 0
# print(f"HTTP Method attacker score: {http_method_attacker_score}")
@@ -174,21 +148,6 @@ def main():
score["bad_crawler"]["risky_http_methods"] = False
score["regular_user"]["risky_http_methods"] = False
# --------------------- Robots Violations ---------------------
# respect robots.txt and login/config pages access frequency
robots_disallows = []
robots_path = Path(__file__).parent.parent / "templates" / "html" / "robots.txt"
with open(robots_path, "r") as f:
for line in f:
line = line.strip()
if not line:
continue
parts = line.split(":")
if parts[0] == "Disallow":
parts[1] = parts[1].rstrip("/")
# print(f"DISALLOW {parts[1]}")
robots_disallows.append(parts[1].strip())
# if 0 100% sure is good crawler, if >10% of robots violated is bad crawler or attacker
violated_robots_count = len(
[
item
@@ -261,7 +220,7 @@ def main():
if len(user_agents_used) >= user_agents_used_threshold:
score["attacker"]["different_user_agents"] = True
score["good_crawler"]["different_user_agents"] = False
score["bad_crawler"]["different_user_agentss"] = True
score["bad_crawler"]["different_user_agents"] = True
score["regular_user"]["different_user_agents"] = False
else:
score["attacker"]["different_user_agents"] = False

102
src/tasks/db_dump.py Normal file
View File

@@ -0,0 +1,102 @@
# tasks/db_dump.py
from logger import get_app_logger
from database import get_database
from config import get_config
from sqlalchemy import MetaData
from sqlalchemy.schema import CreateTable
import os
config = get_config()
app_logger = get_app_logger()
# ----------------------
# TASK CONFIG
# ----------------------
TASK_CONFIG = {
"name": "dump-krawl-data",
"cron": f"{config.backups_cron}",
"enabled": config.backups_enabled,
"run_when_loaded": True,
}
# ----------------------
# TASK LOGIC
# ----------------------
def main():
"""
Dump krawl database to a sql file for backups
"""
task_name = TASK_CONFIG.get("name")
app_logger.info(f"[Background Task] {task_name} starting...")
try:
db = get_database()
engine = db._engine
metadata = MetaData()
metadata.reflect(bind=engine)
# create backup directory
os.makedirs(config.backups_path, exist_ok=True)
output_file = os.path.join(config.backups_path, "db_dump.sql")
with open(output_file, "w") as f:
# Write header
app_logger.info(f"[Background Task] {task_name} started database dump")
# Dump schema (CREATE TABLE statements)
f.write("-- Schema\n")
f.write("-- " + "=" * 70 + "\n\n")
for table_name in metadata.tables:
table = metadata.tables[table_name]
app_logger.info(
f"[Background Task] {task_name} dumping {table} table schema"
)
# Create table statement
create_stmt = str(CreateTable(table).compile(engine))
f.write(f"{create_stmt};\n\n")
f.write("\n-- Data\n")
f.write("-- " + "=" * 70 + "\n\n")
with engine.connect() as conn:
for table_name in metadata.tables:
table = metadata.tables[table_name]
f.write(f"-- Table: {table_name}\n")
# Select all data from table
result = conn.execute(table.select())
rows = result.fetchall()
if rows:
app_logger.info(
f"[Background Task] {task_name} dumping {table} content"
)
for row in rows:
# Build INSERT statement
columns = ", ".join([col.name for col in table.columns])
values = ", ".join([repr(value) for value in row])
f.write(
f"INSERT INTO {table_name} ({columns}) VALUES ({values});\n"
)
f.write("\n")
else:
f.write(f"-- No data in {table_name}\n\n")
app_logger.info(
f"[Background Task] {task_name} no data in {table}"
)
app_logger.info(
f"[Background Task] {task_name} Database dump completed: {output_file}"
)
except Exception as e:
app_logger.error(f"[Background Task] {task_name} failed: {e}")
finally:
db.close_session()

81
src/tasks/db_retention.py Normal file
View File

@@ -0,0 +1,81 @@
#!/usr/bin/env python3
"""
Database retention task for Krawl honeypot.
Periodically deletes old records based on configured retention_days.
"""
from datetime import datetime, timedelta
from database import get_database
from logger import get_app_logger
# ----------------------
# TASK CONFIG
# ----------------------
TASK_CONFIG = {
"name": "db-retention",
"cron": "0 3 * * *", # Run daily at 3 AM
"enabled": True,
"run_when_loaded": False,
}
app_logger = get_app_logger()
def main():
"""
Delete access logs, credential attempts, and attack detections
older than the configured retention period.
"""
try:
from config import get_config
from models import AccessLog, CredentialAttempt, AttackDetection
config = get_config()
retention_days = config.database_retention_days
db = get_database()
session = db.session
cutoff = datetime.now() - timedelta(days=retention_days)
# Delete attack detections linked to old access logs first (FK constraint)
old_log_ids = session.query(AccessLog.id).filter(AccessLog.timestamp < cutoff)
detections_deleted = (
session.query(AttackDetection)
.filter(AttackDetection.access_log_id.in_(old_log_ids))
.delete(synchronize_session=False)
)
# Delete old access logs
logs_deleted = (
session.query(AccessLog)
.filter(AccessLog.timestamp < cutoff)
.delete(synchronize_session=False)
)
# Delete old credential attempts
creds_deleted = (
session.query(CredentialAttempt)
.filter(CredentialAttempt.timestamp < cutoff)
.delete(synchronize_session=False)
)
session.commit()
if logs_deleted or creds_deleted or detections_deleted:
app_logger.info(
f"DB retention: Deleted {logs_deleted} access logs, "
f"{detections_deleted} attack detections, "
f"{creds_deleted} credential attempts older than {retention_days} days"
)
except Exception as e:
app_logger.error(f"Error during DB retention cleanup: {e}")
finally:
try:
db.close_session()
except Exception as e:
app_logger.error(f"Error closing DB session after retention cleanup: {e}")

View File

@@ -2,7 +2,7 @@ from database import get_database
from logger import get_app_logger
import requests
from sanitizer import sanitize_for_storage, sanitize_dict
from geo_utils import get_most_recent_geoip_data, extract_city_from_coordinates
from geo_utils import extract_geolocation_from_ip, fetch_blocklist_data
# ----------------------
# TASK CONFIG
@@ -27,34 +27,51 @@ def main():
)
for ip in unenriched_ips:
try:
api_url = "https://iprep.lcrawl.com/api/iprep/"
params = {"cidr": ip}
headers = {"Content-Type": "application/json"}
response = requests.get(api_url, headers=headers, params=params, timeout=10)
payload = response.json()
# Fetch geolocation data using ip-api.com
geoloc_data = extract_geolocation_from_ip(ip)
if payload.get("results"):
results = payload["results"]
# Fetch blocklist data from lcrawl API
blocklist_data = fetch_blocklist_data(ip)
# Get the most recent result (first in list, sorted by record_added)
most_recent = results[0]
geoip_data = most_recent.get("geoip_data", {})
list_on = most_recent.get("list_on", {})
if geoloc_data:
# Extract fields from the new API response
country_iso_code = geoloc_data.get("country_code")
country = geoloc_data.get("country")
region = geoloc_data.get("region")
region_name = geoloc_data.get("region_name")
city = geoloc_data.get("city")
timezone = geoloc_data.get("timezone")
isp = geoloc_data.get("isp")
reverse = geoloc_data.get("reverse")
asn = geoloc_data.get("asn")
asn_org = geoloc_data.get("org")
latitude = geoloc_data.get("latitude")
longitude = geoloc_data.get("longitude")
is_proxy = geoloc_data.get("is_proxy", False)
is_hosting = geoloc_data.get("is_hosting", False)
# Extract standard fields
country_iso_code = geoip_data.get("country_iso_code")
asn = geoip_data.get("asn_autonomous_system_number")
asn_org = geoip_data.get("asn_autonomous_system_organization")
latitude = geoip_data.get("location_latitude")
longitude = geoip_data.get("location_longitude")
# Use blocklist data if available, otherwise create default with flags
if blocklist_data:
list_on = blocklist_data
else:
list_on = {}
# Extract city from coordinates using reverse geocoding
city = extract_city_from_coordinates(geoip_data)
# Add flags to list_on
list_on["is_proxy"] = is_proxy
list_on["is_hosting"] = is_hosting
sanitized_country_iso_code = sanitize_for_storage(country_iso_code, 3)
sanitized_country = sanitize_for_storage(country, 100)
sanitized_region = sanitize_for_storage(region, 2)
sanitized_region_name = sanitize_for_storage(region_name, 100)
sanitized_asn = sanitize_for_storage(asn, 100)
sanitized_asn_org = sanitize_for_storage(asn_org, 100)
sanitized_city = sanitize_for_storage(city, 100) if city else None
sanitized_timezone = sanitize_for_storage(timezone, 50)
sanitized_isp = sanitize_for_storage(isp, 100)
sanitized_reverse = (
sanitize_for_storage(reverse, 255) if reverse else None
)
sanitized_list_on = sanitize_dict(list_on, 100000)
db_manager.update_ip_rep_infos(
@@ -63,11 +80,19 @@ def main():
sanitized_asn,
sanitized_asn_org,
sanitized_list_on,
sanitized_city,
latitude,
longitude,
city=sanitized_city,
latitude=latitude,
longitude=longitude,
country=sanitized_country,
region=sanitized_region,
region_name=sanitized_region_name,
timezone=sanitized_timezone,
isp=sanitized_isp,
reverse=sanitized_reverse,
is_proxy=is_proxy,
is_hosting=is_hosting,
)
except requests.RequestException as e:
app_logger.warning(f"Failed to fetch IP rep for {ip}: {e}")
app_logger.warning(f"Failed to fetch geolocation for {ip}: {e}")
except Exception as e:
app_logger.error(f"Error processing IP {ip}: {e}")

View File

@@ -0,0 +1,46 @@
from database import get_database
from logger import get_app_logger
# ----------------------
# TASK CONFIG
# ----------------------
TASK_CONFIG = {
"name": "flag-stale-ips",
"cron": "0 2 * * *", # Run daily at 2 AM
"enabled": True,
"run_when_loaded": True,
}
# Set to True to force all IPs to be flagged for reevaluation on next run.
# Resets to False automatically after execution.
FORCE_IP_RESCAN = False
def main():
global FORCE_IP_RESCAN
app_logger = get_app_logger()
db = get_database()
try:
if FORCE_IP_RESCAN:
count = db.flag_all_ips_for_reevaluation()
FORCE_IP_RESCAN = False
app_logger.info(
f"[Background Task] flag-stale-ips: FORCE RESCAN - Flagged {count} IPs for reevaluation"
)
else:
count = db.flag_stale_ips_for_reevaluation()
if count > 0:
app_logger.info(
f"[Background Task] flag-stale-ips: Flagged {count} stale IPs for reevaluation"
)
else:
app_logger.debug(
"[Background Task] flag-stale-ips: No stale IPs found to flag"
)
except Exception as e:
app_logger.error(
f"[Background Task] flag-stale-ips: Error flagging stale IPs: {e}"
)

View File

@@ -2,10 +2,12 @@
"""
Memory cleanup task for Krawl honeypot.
Periodically trims unbounded in-memory structures to prevent OOM.
NOTE: This task is no longer needed. Ban/rate-limit state has been moved from
in-memory ip_page_visits dict to the ip_stats DB table, eliminating unbounded
memory growth. Kept disabled for reference.
"""
from database import get_database
from logger import get_app_logger
# ----------------------
@@ -14,8 +16,8 @@ from logger import get_app_logger
TASK_CONFIG = {
"name": "memory-cleanup",
"cron": "*/5 * * * *", # Run every 5 minutes
"enabled": True,
"cron": "*/5 * * * *",
"enabled": False,
"run_when_loaded": False,
}
@@ -23,49 +25,4 @@ app_logger = get_app_logger()
def main():
"""
Clean up in-memory structures in the tracker.
Called periodically to prevent unbounded memory growth.
"""
try:
# Import here to avoid circular imports
from handler import Handler
if not Handler.tracker:
app_logger.warning("Tracker not initialized, skipping memory cleanup")
return
# Get memory stats before cleanup
stats_before = Handler.tracker.get_memory_stats()
# Run cleanup
Handler.tracker.cleanup_memory()
# Get memory stats after cleanup
stats_after = Handler.tracker.get_memory_stats()
# Log changes
access_log_reduced = (
stats_before["access_log_size"] - stats_after["access_log_size"]
)
cred_reduced = (
stats_before["credential_attempts_size"]
- stats_after["credential_attempts_size"]
)
if access_log_reduced > 0 or cred_reduced > 0:
app_logger.info(
f"Memory cleanup: Trimmed {access_log_reduced} access logs, "
f"{cred_reduced} credential attempts"
)
# Log current memory state for monitoring
app_logger.debug(
f"Memory stats after cleanup: "
f"access_logs={stats_after['access_log_size']}, "
f"credentials={stats_after['credential_attempts_size']}, "
f"unique_ips={stats_after['unique_ips_tracked']}"
)
except Exception as e:
app_logger.error(f"Error during memory cleanup: {e}")
app_logger.debug("memory-cleanup task is disabled (ban state now in DB)")

View File

@@ -4,9 +4,14 @@ import os
from logger import get_app_logger
from database import get_database
from config import get_config
from models import IpStats
from models import IpStats, AccessLog
from ip_utils import is_valid_public_ip
from sqlalchemy import distinct
from firewall.fwtype import FWType
from firewall.iptables import Iptables
from firewall.raw import Raw
config = get_config()
app_logger = get_app_logger()
# ----------------------
@@ -20,7 +25,7 @@ TASK_CONFIG = {
}
EXPORTS_DIR = os.path.join(os.path.dirname(os.path.dirname(__file__)), "exports")
OUTPUT_FILE = os.path.join(EXPORTS_DIR, "malicious_ips.txt")
EXPORTS_DIR = config.exports_path
# ----------------------
@@ -48,7 +53,6 @@ def main():
)
# Filter out local/private IPs and the server's own IP
config = get_config()
server_ip = config.get_server_ip()
public_ips = [
@@ -61,14 +65,24 @@ def main():
os.makedirs(EXPORTS_DIR, exist_ok=True)
# Write IPs to file (one per line)
with open(OUTPUT_FILE, "w") as f:
for ip in public_ips:
f.write(f"{ip}\n")
for fwname in FWType._registry:
app_logger.info(
f"[Background Task] {task_name} exported {len(public_ips)} attacker IPs "
f"(filtered {len(attackers) - len(public_ips)} local/private IPs) to {OUTPUT_FILE}"
)
# get banlist for specific ip
fw = FWType.create(fwname)
banlist = fw.getBanlist(public_ips)
output_file = os.path.join(EXPORTS_DIR, f"{fwname}_banlist.txt")
if fwname == "raw":
output_file = os.path.join(EXPORTS_DIR, f"malicious_ips.txt")
with open(output_file, "w") as f:
f.write(f"{banlist}\n")
app_logger.info(
f"[Background Task] {task_name} exported {len(public_ips)} in {fwname} public IPs"
f"(filtered {len(attackers) - len(public_ips)} local/private IPs) to {output_file}"
)
except Exception as e:
app_logger.error(f"[Background Task] {task_name} failed: {e}")

View File

@@ -40,7 +40,6 @@ class TasksMaster:
def __init__(self, scheduler: BackgroundScheduler):
self.tasks = self._config_tasks()
self.scheduler = scheduler
self.last_run_times = {}
self.scheduler.add_listener(
self.job_listener, EVENT_JOB_EXECUTED | EVENT_JOB_ERROR
)
@@ -234,9 +233,6 @@ class TasksMaster:
app_logger.error(f"Failed to load {module_name}: {e}")
def job_listener(self, event):
job_id = event.job_id
self.last_run_times[job_id] = datetime.datetime.now()
if event.exception:
app_logger.error(f"Job {event.job_id} failed: {event.exception}")
else:

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,28 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Krawl Dashboard</title>
<link rel="icon" type="image/svg+xml" href="{{ dashboard_path }}/static/krawl-svg.svg" />
<link rel="stylesheet" href="{{ dashboard_path }}/static/vendor/css/leaflet.min.css" />
<link rel="stylesheet" href="{{ dashboard_path }}/static/vendor/css/MarkerCluster.css" />
<link rel="stylesheet" href="{{ dashboard_path }}/static/vendor/css/MarkerCluster.Default.css" />
<link rel="stylesheet" href="{{ dashboard_path }}/static/css/dashboard.css" />
<script src="{{ dashboard_path }}/static/vendor/js/leaflet.min.js" defer></script>
<script src="{{ dashboard_path }}/static/vendor/js/leaflet.markercluster.js" defer></script>
<script src="{{ dashboard_path }}/static/vendor/js/chart.min.js" defer></script>
<script src="{{ dashboard_path }}/static/vendor/js/htmx.min.js" defer></script>
<script defer src="{{ dashboard_path }}/static/vendor/js/alpine.min.js"></script>
<script>window.__DASHBOARD_PATH__ = '{{ dashboard_path }}';</script>
</head>
<body>
{% block content %}{% endblock %}
<script src="{{ dashboard_path }}/static/js/radar.js"></script>
<script src="{{ dashboard_path }}/static/js/dashboard.js"></script>
<script src="{{ dashboard_path }}/static/js/map.js"></script>
<script src="{{ dashboard_path }}/static/js/charts.js"></script>
{% block scripts %}{% endblock %}
</body>
</html>

View File

@@ -0,0 +1,191 @@
{% extends "base.html" %}
{% block content %}
<div class="container" x-data="dashboardApp()" x-init="init()">
{# GitHub logo #}
<a href="https://github.com/BlessedRebuS/Krawl" target="_blank" class="github-logo">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8z"/></svg>
<span class="github-logo-text">Krawl</span>
</a>
{# Banlist export dropdown - Alpine.js #}
<div class="download-section">
<div class="banlist-dropdown" @click.outside="banlistOpen = false">
<button class="banlist-dropdown-btn" @click="banlistOpen = !banlistOpen">
Export IPs Banlist ▾
</button>
<div class="banlist-dropdown-menu" :class="{ 'show': banlistOpen }">
<a :href="dashboardPath + '/api/get_banlist?fwtype=raw'" download>
<span class="banlist-icon">📄</span> Raw IPs List
</a>
<a :href="dashboardPath + '/api/get_banlist?fwtype=iptables'" download>
<span class="banlist-icon">🔥</span> IPTables Rules
</a>
</div>
</div>
</div>
<h1>Krawl Dashboard</h1>
{# Stats cards - server-rendered #}
{% include "dashboard/partials/stats_cards.html" %}
{# Search bar #}
<div class="search-bar-container">
<div class="search-bar">
<svg class="search-icon" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor">
<path fill-rule="evenodd" d="M9 3.5a5.5 5.5 0 100 11 5.5 5.5 0 000-11zM2 9a7 7 0 1112.452 4.391l3.328 3.329a.75.75 0 11-1.06 1.06l-3.329-3.328A7 7 0 012 9z" clip-rule="evenodd"/>
</svg>
<input id="search-input"
type="search"
name="q"
placeholder="Search attacks, IPs, patterns, locations..."
autocomplete="off"
hx-get="{{ dashboard_path }}/htmx/search"
hx-trigger="input changed delay:300ms, search"
hx-target="#search-results-container"
hx-swap="innerHTML"
hx-indicator="#search-spinner" />
<span id="search-spinner" class="htmx-indicator search-spinner"></span>
</div>
<div id="search-results-container"></div>
</div>
{# Tab navigation - Alpine.js #}
<div class="tabs-container">
<a class="tab-button" :class="{ active: tab === 'overview' }" @click.prevent="switchToOverview()" href="#overview">Overview</a>
<a class="tab-button" :class="{ active: tab === 'attacks' }" @click.prevent="switchToAttacks()" href="#ip-stats">Attacks</a>
<a class="tab-button" :class="{ active: tab === 'ip-insight', disabled: !insightIp }" @click.prevent="insightIp && switchToIpInsight()" href="#ip-insight">
IP Insight<span x-show="insightIp" x-text="' (' + insightIp + ')'"></span>
</a>
</div>
{# ==================== OVERVIEW TAB ==================== #}
<div x-show="tab === 'overview'" x-init="$nextTick(() => { if (!mapInitialized && typeof initializeAttackerMap === 'function') { initializeAttackerMap(); mapInitialized = true; } })">
{# Map section #}
{% include "dashboard/partials/map_section.html" %}
{# Suspicious Activity - server-rendered (last 10 requests) #}
{% include "dashboard/partials/suspicious_table.html" %}
{# Top IPs + Top User-Agents side by side #}
<div style="display: flex; gap: 20px; flex-wrap: wrap;">
<div class="table-container" style="flex: 1; min-width: 300px;">
<h2>Top IP Addresses</h2>
<div class="htmx-container"
hx-get="{{ dashboard_path }}/htmx/top-ips?page=1"
hx-trigger="load"
hx-swap="innerHTML">
<div class="htmx-indicator">Loading...</div>
</div>
</div>
<div class="table-container" style="flex: 1; min-width: 300px;">
<h2>Top User-Agents</h2>
<div class="htmx-container"
hx-get="{{ dashboard_path }}/htmx/top-ua?page=1"
hx-trigger="load"
hx-swap="innerHTML">
<div class="htmx-indicator">Loading...</div>
</div>
</div>
</div>
{# Top Paths #}
<div class="table-container">
<h2>Top Paths</h2>
<div class="htmx-container"
hx-get="{{ dashboard_path }}/htmx/top-paths?page=1"
hx-trigger="load"
hx-swap="innerHTML">
<div class="htmx-indicator">Loading...</div>
</div>
</div>
</div>
{# ==================== ATTACKS TAB ==================== #}
<div x-show="tab === 'attacks'" x-cloak>
{# Attackers table - HTMX loaded #}
<div class="table-container alert-section">
<h2>Attackers by Total Requests</h2>
<div class="htmx-container"
hx-get="{{ dashboard_path }}/htmx/attackers?page=1"
hx-trigger="revealed"
hx-swap="innerHTML">
<div class="htmx-indicator">Loading...</div>
</div>
</div>
{# Credentials table #}
<div class="table-container alert-section">
<h2>Captured Credentials</h2>
<div class="htmx-container"
hx-get="{{ dashboard_path }}/htmx/credentials?page=1"
hx-trigger="revealed"
hx-swap="innerHTML">
<div class="htmx-indicator">Loading...</div>
</div>
</div>
{# Honeypot Triggers - HTMX loaded #}
<div class="table-container alert-section">
<h2>Honeypot Triggers by IP</h2>
<div class="htmx-container"
hx-get="{{ dashboard_path }}/htmx/honeypot?page=1"
hx-trigger="revealed"
hx-swap="innerHTML">
<div class="htmx-indicator">Loading...</div>
</div>
</div>
{# Attack Types table #}
<div class="table-container alert-section">
<h2>Detected Attack Types</h2>
<div class="htmx-container"
hx-get="{{ dashboard_path }}/htmx/attacks?page=1"
hx-trigger="revealed"
hx-swap="innerHTML">
<div class="htmx-indicator">Loading...</div>
</div>
</div>
{# Charts + Patterns side by side #}
<div class="charts-container">
<div class="table-container chart-section">
<h2>Most Recurring Attack Types</h2>
<div class="chart-wrapper">
<canvas id="attack-types-chart"></canvas>
</div>
</div>
<div class="table-container chart-section">
<h2>Most Recurring Attack Patterns</h2>
<div class="htmx-container"
hx-get="{{ dashboard_path }}/htmx/patterns?page=1"
hx-trigger="revealed"
hx-swap="innerHTML">
<div class="htmx-indicator">Loading...</div>
</div>
</div>
</div>
</div>
{# ==================== IP INSIGHT TAB ==================== #}
<div x-show="tab === 'ip-insight'" x-cloak>
{# IP Insight content - loaded via HTMX when IP is selected #}
<div id="ip-insight-container">
<template x-if="!insightIp">
<div class="table-container" style="text-align: center; padding: 60px 20px;">
<p style="color: #8b949e; font-size: 16px;">Select an IP address from any table to view detailed insights.</p>
</div>
</template>
<div x-show="insightIp" id="ip-insight-htmx-container"></div>
</div>
</div>
{# Raw request modal - Alpine.js #}
{% include "dashboard/partials/raw_request_modal.html" %}
</div>
{% endblock %}

View File

@@ -0,0 +1,38 @@
{% extends "base.html" %}
{% block content %}
<div class="container" x-data="dashboardApp()" x-init="init()">
{# GitHub logo #}
<a href="https://github.com/BlessedRebuS/Krawl" target="_blank" class="github-logo">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8z"/></svg>
<span class="github-logo-text">Krawl</span>
</a>
{# Back to dashboard link #}
<div style="position: absolute; top: 0; right: 0;">
<a href="{{ dashboard_path }}/" class="download-btn" style="text-decoration: none;">
&larr; Back to Dashboard
</a>
</div>
{% set uid = "ip" %}
{% include "dashboard/partials/_ip_detail.html" %}
{# Raw Request Modal #}
<div class="raw-request-modal" x-show="rawModal.show" @click.self="closeRawModal()" x-cloak>
<div class="raw-request-modal-content">
<div class="raw-request-modal-header">
<h3>Raw Request</h3>
<span class="raw-request-modal-close" @click="closeRawModal()">&times;</span>
</div>
<div class="raw-request-modal-body">
<pre class="raw-request-content" x-text="rawModal.content"></pre>
</div>
<div class="raw-request-modal-footer">
<button class="raw-request-download-btn" @click="downloadRawRequest()">Download</button>
</div>
</div>
</div>
</div>
{% endblock %}

View File

@@ -0,0 +1,295 @@
{# Shared IP detail content included by ip.html and ip_insight.html.
Expects: stats, ip_address, dashboard_path, uid (unique prefix for element IDs) #}
{# Page header #}
<div class="ip-page-header">
<h1>
<span class="ip-address-title">{{ ip_address }}</span>
{% if stats.category %}
<span class="category-badge category-{{ stats.category | lower | replace('_', '-') }}">
{{ stats.category | replace('_', ' ') | title }}
</span>
{% endif %}
</h1>
{% if stats.city or stats.country %}
<p class="ip-location-subtitle">
{{ stats.city | default('') }}{% if stats.city and stats.country %}, {% endif %}{{ stats.country | default(stats.country_code | default('')) }}
</p>
{% endif %}
</div>
{# ── Two-column layout: Info + Radar/Timeline ───── #}
<div class="ip-page-grid">
{# Left column: single IP Information card #}
<div class="ip-page-left">
<div class="table-container ip-detail-card ip-info-card">
<h2>IP Information</h2>
{# Activity section #}
<h3 class="ip-section-heading">Activity</h3>
<dl class="ip-dl">
<div class="ip-dl-row">
<dt>Total Requests</dt>
<dd>{{ stats.total_requests | default('N/A') }}</dd>
</div>
<div class="ip-dl-row">
<dt>First Seen</dt>
<dd class="ip-dl-highlight">{{ stats.first_seen | format_ts }}</dd>
</div>
<div class="ip-dl-row">
<dt>Last Seen</dt>
<dd class="ip-dl-highlight">{{ stats.last_seen | format_ts }}</dd>
</div>
{% if stats.last_analysis %}
<div class="ip-dl-row">
<dt>Last Analysis</dt>
<dd class="ip-dl-highlight">{{ stats.last_analysis | format_ts }}</dd>
</div>
{% endif %}
</dl>
{# Geo & Network section #}
<h3 class="ip-section-heading">Geo & Network</h3>
<dl class="ip-dl">
{% if stats.city or stats.country %}
<div class="ip-dl-row">
<dt>Location</dt>
<dd>{{ stats.city | default('') | e }}{% if stats.city and stats.country %}, {% endif %}{{ stats.country | default(stats.country_code | default('')) | e }}</dd>
</div>
{% endif %}
{% if stats.region_name %}
<div class="ip-dl-row">
<dt>Region</dt>
<dd>{{ stats.region_name | e }}</dd>
</div>
{% endif %}
{% if stats.timezone %}
<div class="ip-dl-row">
<dt>Timezone</dt>
<dd>{{ stats.timezone | e }}</dd>
</div>
{% endif %}
{% if stats.isp %}
<div class="ip-dl-row">
<dt>ISP</dt>
<dd>{{ stats.isp | e }}</dd>
</div>
{% endif %}
{% if stats.asn_org %}
<div class="ip-dl-row">
<dt>Organization</dt>
<dd>{{ stats.asn_org | e }}</dd>
</div>
{% endif %}
{% if stats.asn %}
<div class="ip-dl-row">
<dt>ASN</dt>
<dd>AS{{ stats.asn }}</dd>
</div>
{% endif %}
{% if stats.reverse_dns %}
<div class="ip-dl-row">
<dt>Reverse DNS</dt>
<dd class="ip-dl-mono">{{ stats.reverse_dns | e }}</dd>
</div>
{% endif %}
</dl>
{# Reputation section #}
<h3 class="ip-section-heading">Reputation</h3>
<div class="ip-rep-scroll">
{# Flags #}
{% set flags = [] %}
{% if stats.is_proxy %}{% set _ = flags.append('Proxy') %}{% endif %}
{% if stats.is_hosting %}{% set _ = flags.append('Hosting') %}{% endif %}
{% if flags %}
<div class="ip-rep-row">
<span class="ip-rep-label">Flags</span>
<div class="ip-rep-tags">
{% for flag in flags %}
<span class="ip-flag">{{ flag }}</span>
{% endfor %}
</div>
</div>
{% endif %}
{# Blocklists #}
<div class="ip-rep-row">
<span class="ip-rep-label">Listed On</span>
{% if stats.blocklist_memberships %}
<div class="ip-rep-tags">
{% for bl in stats.blocklist_memberships %}
<span class="reputation-badge">{{ bl | e }}</span>
{% endfor %}
</div>
{% else %}
<span class="reputation-clean">Clean</span>
{% endif %}
</div>
</div>
</div>
</div>
{# Right column: Category Analysis + Timeline + Attack Types #}
<div class="ip-page-right">
{% if stats.category_scores %}
<div class="table-container ip-detail-card">
<h2>Category Analysis</h2>
<div class="radar-chart-container">
<div class="radar-chart" id="{{ uid }}-radar-chart"></div>
</div>
</div>
{% endif %}
{# Bottom row: Behavior Timeline + Attack Types side by side #}
<div class="ip-bottom-row">
{% if stats.category_history %}
<div class="table-container ip-detail-card ip-timeline-card">
<h2>Behavior Timeline</h2>
<div class="ip-timeline-scroll">
<div class="ip-timeline-hz">
{% for entry in stats.category_history %}
<div class="ip-tl-entry">
<div class="ip-tl-dot {{ entry.new_category | default('unknown') | replace('_', '-') }}"></div>
<div class="ip-tl-content">
<span class="ip-tl-cat">{{ entry.new_category | default('unknown') | replace('_', ' ') | title }}</span>
{% if entry.old_category %}
<span class="ip-tl-from">from {{ entry.old_category | replace('_', ' ') | title }}</span>
{% else %}
<span class="ip-tl-from">initial classification</span>
{% endif %}
<span class="ip-tl-time">{{ entry.timestamp | format_ts }}</span>
</div>
</div>
{% endfor %}
</div>
</div>
</div>
{% endif %}
<div class="table-container ip-detail-card ip-attack-types-card">
<h2>Attack Types</h2>
<div class="ip-attack-chart-wrapper">
<canvas id="{{ uid }}-attack-types-chart"></canvas>
</div>
</div>
</div>
</div>
</div>
{# Location map #}
{% if stats.latitude and stats.longitude %}
<div class="table-container" style="margin-top: 20px;">
<h2>Location</h2>
<div id="{{ uid }}-ip-map" style="height: 300px; border-radius: 6px; border: 1px solid #30363d;"></div>
</div>
{% endif %}
{# Detected Attack Types table only for attackers #}
{% if stats.category and stats.category | lower == 'attacker' %}
<div class="table-container alert-section" style="margin-top: 20px;">
<h2>Detected Attack Types</h2>
<div class="htmx-container"
hx-get="{{ dashboard_path }}/htmx/attacks?page=1&ip_filter={{ ip_address }}"
hx-trigger="revealed"
hx-swap="innerHTML">
<div class="htmx-indicator">Loading...</div>
</div>
</div>
{% endif %}
{# Access History table #}
<div class="table-container alert-section" style="margin-top: 20px;">
<h2>Access History</h2>
<div class="htmx-container"
hx-get="{{ dashboard_path }}/htmx/access-logs?page=1&ip_filter={{ ip_address }}"
hx-trigger="revealed"
hx-swap="innerHTML">
<div class="htmx-indicator">Loading...</div>
</div>
</div>
{# Inline init script #}
<script>
(function() {
var UID = '{{ uid }}';
// Radar chart
{% if stats.category_scores %}
var scores = {{ stats.category_scores | tojson }};
var radarEl = document.getElementById(UID + '-radar-chart');
if (radarEl && typeof generateRadarChart === 'function') {
radarEl.innerHTML = generateRadarChart(scores, 280, true, 'side');
}
{% endif %}
// Attack types chart
function initAttackChart() {
if (typeof loadAttackTypesChart === 'function') {
loadAttackTypesChart(UID + '-attack-types-chart', '{{ ip_address }}', 'bottom');
}
}
if (typeof Chart !== 'undefined') {
initAttackChart();
} else {
document.addEventListener('DOMContentLoaded', initAttackChart);
}
// Location map
{% if stats.latitude and stats.longitude %}
function initMap() {
var mapContainer = document.getElementById(UID + '-ip-map');
if (!mapContainer || typeof L === 'undefined') return;
if (mapContainer._leaflet_id) {
mapContainer._leaflet_id = null;
}
mapContainer.innerHTML = '';
var lat = {{ stats.latitude }};
var lng = {{ stats.longitude }};
var category = '{{ stats.category | default("unknown") | lower }}';
var categoryColors = {
attacker: '#f85149',
bad_crawler: '#f0883e',
good_crawler: '#3fb950',
regular_user: '#58a6ff',
unknown: '#8b949e'
};
var map = L.map(UID + '-ip-map', {
center: [lat, lng],
zoom: 6,
zoomControl: true,
scrollWheelZoom: true
});
L.tileLayer('https://{s}.basemaps.cartocdn.com/dark_all/{z}/{x}/{y}{r}.png', {
attribution: '&copy; CartoDB | &copy; OpenStreetMap contributors',
maxZoom: 19,
subdomains: 'abcd'
}).addTo(map);
var color = categoryColors[category] || '#8b949e';
var markerHtml = '<div style="width:24px;height:24px;background:' + color +
';border:3px solid #fff;border-radius:50%;box-shadow:0 0 12px ' + color +
',0 0 24px ' + color + '80;"></div>';
var icon = L.divIcon({
html: markerHtml,
iconSize: [24, 24],
className: 'single-ip-marker'
});
L.marker([lat, lng], { icon: icon }).addTo(map);
}
setTimeout(initMap, 100);
{% else %}
var mapContainer = document.getElementById(UID + '-ip-map');
if (mapContainer) {
mapContainer.innerHTML = '<div style="display:flex;align-items:center;justify-content:center;height:100%;color:#8b949e;">Location data not available</div>';
}
{% endif %}
})();
</script>

View File

@@ -0,0 +1,63 @@
{# HTMX fragment: Detected Access logs by ip table #}
<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 10px;">
<span class="pagination-info">Page {{ pagination.page }}/{{ pagination.total_pages }} &mdash; {{ pagination.total }} total</span>
<div style="display: flex; gap: 8px;">
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/access-logs?page={{ pagination.page - 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}&ip_filter={{ ip_filter }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page <= 1 %}disabled{% endif %}>Prev</button>
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/access-logs?page={{ pagination.page + 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}&ip_filter={{ ip_filter }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page >= pagination.total_pages %}disabled{% endif %}>Next</button>
</div>
</div>
<table>
<thead>
<tr>
<th>#</th>
<th>Path</th>
<th>User-Agent</th>
<th class="sortable {% if sort_by == 'timestamp' %}{{ sort_order }}{% endif %}"
hx-get="{{ dashboard_path }}/htmx/access-logs?page=1&sort_by=timestamp&sort_order={% if sort_by == 'timestamp' and sort_order == 'desc' %}asc{% else %}desc{% endif %}&ip_filter={{ ip_filter }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML">
Time
</th>
<th style="width: 100px;"></th>
</tr>
</thead>
<tbody>
{% for log in items %}
<tr class="ip-row" data-ip="{{ log.ip | e }}">
<td class="rank">{{ loop.index + (pagination.page - 1) * pagination.page_size }}</td>
<td>
<div class="path-cell-container">
<span class="path-truncated">{{ log.path | e }}</span>
{% if log.path | length > 30 %}
<div class="path-tooltip">{{ log.path | e }}</div>
{% endif %}
</div>
</td>
<td>{{ (log.user_agent | default(''))[:50] | e }}</td>
<td>{{ log.timestamp | format_ts }}</td>
<td>
{% if log.id %}
<button class="view-btn" @click="viewRawRequest({{ log.id }})">View Request</button>
{% endif %}
</td>
</tr>
<tr class="ip-stats-row" style="display: none;">
<td colspan="5" class="ip-stats-cell">
<div class="ip-stats-dropdown">
<div class="loading">Loading stats...</div>
</div>
</td>
</tr>
{% else %}
<tr><td colspan="5" style="text-align: center;">No logs detected</td></tr>
{% endfor %}
</tbody>
</table>

View File

@@ -0,0 +1,83 @@
{# HTMX fragment: Detected Attack Types table #}
<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 10px;">
<span class="pagination-info">Page {{ pagination.page }}/{{ pagination.total_pages }} &mdash; {{ pagination.total }} total</span>
<div style="display: flex; gap: 8px;">
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/attacks?page={{ pagination.page - 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}{% if ip_filter %}&ip_filter={{ ip_filter }}{% endif %}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page <= 1 %}disabled{% endif %}>Prev</button>
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/attacks?page={{ pagination.page + 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}{% if ip_filter %}&ip_filter={{ ip_filter }}{% endif %}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page >= pagination.total_pages %}disabled{% endif %}>Next</button>
</div>
</div>
<table>
<thead>
<tr>
<th>#</th>
<th>IP Address</th>
<th>Path</th>
<th>Attack Types</th>
<th>User-Agent</th>
<th class="sortable {% if sort_by == 'timestamp' %}{{ sort_order }}{% endif %}"
hx-get="{{ dashboard_path }}/htmx/attacks?page=1&sort_by=timestamp&sort_order={% if sort_by == 'timestamp' and sort_order == 'desc' %}asc{% else %}desc{% endif %}{% if ip_filter %}&ip_filter={{ ip_filter }}{% endif %}"
hx-target="closest .htmx-container"
hx-swap="innerHTML">
Time
</th>
<th style="width: 80px;"></th>
</tr>
</thead>
<tbody>
{% for attack in items %}
<tr class="ip-row" data-ip="{{ attack.ip | e }}">
<td class="rank">{{ loop.index + (pagination.page - 1) * pagination.page_size }}</td>
<td class="ip-clickable"
hx-get="{{ dashboard_path }}/htmx/ip-detail/{{ attack.ip | e }}"
hx-target="next .ip-stats-row .ip-stats-dropdown"
hx-swap="innerHTML"
@click="toggleIpDetail($event)">
{{ attack.ip | e }}
</td>
<td>
<div class="path-cell-container">
<span class="path-truncated">{{ attack.path | e }}</span>
{% if attack.path | length > 30 %}
<div class="path-tooltip">{{ attack.path | e }}</div>
{% endif %}
</div>
</td>
<td>
<div class="attack-types-cell">
<span class="attack-types-truncated">{{ attack.attack_type | e }}</span>
{% if attack.attack_type | length > 30 %}
<div class="attack-types-tooltip">{{ attack.attack_type | e }}</div>
{% endif %}
</div>
</td>
<td>{{ (attack.user_agent | default(''))[:50] | e }}</td>
<td>{{ attack.timestamp | format_ts }}</td>
<td style="display: flex; gap: 6px; flex-wrap: wrap;">
{% if attack.log_id %}
<button class="view-btn" @click="viewRawRequest({{ attack.log_id }})">View Request</button>
{% endif %}
<button class="inspect-btn" @click="openIpInsight('{{ attack.ip | e }}')" title="Inspect IP">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M10.68 11.74a6 6 0 0 1-7.922-8.982 6 6 0 0 1 8.982 7.922l3.04 3.04a.749.749 0 0 1-.326 1.275.749.749 0 0 1-.734-.215ZM11.5 7a4.499 4.499 0 1 0-8.997 0A4.499 4.499 0 0 0 11.5 7Z"/></svg>
</button>
</td>
</tr>
<tr class="ip-stats-row" style="display: none;">
<td colspan="7" class="ip-stats-cell">
<div class="ip-stats-dropdown">
<div class="loading">Loading stats...</div>
</div>
</td>
</tr>
{% else %}
<tr><td colspan="7" class="empty-state">No attacks detected</td></tr>
{% endfor %}
</tbody>
</table>

View File

@@ -0,0 +1,74 @@
{# HTMX fragment: Attackers table #}
<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 10px;">
<span class="pagination-info">Page {{ pagination.page }}/{{ pagination.total_pages }} &mdash; {{ pagination.total }} attackers</span>
<div style="display: flex; gap: 8px;">
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/attackers?page={{ pagination.page - 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page <= 1 %}disabled{% endif %}>Prev</button>
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/attackers?page={{ pagination.page + 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page >= pagination.total_pages %}disabled{% endif %}>Next</button>
</div>
</div>
<table class="ip-stats-table">
<thead>
<tr>
<th>#</th>
<th>IP Address</th>
<th class="sortable {% if sort_by == 'total_requests' %}{{ sort_order }}{% endif %}"
hx-get="{{ dashboard_path }}/htmx/attackers?page=1&sort_by=total_requests&sort_order={% if sort_by == 'total_requests' and sort_order == 'desc' %}asc{% else %}desc{% endif %}"
hx-target="closest .htmx-container"
hx-swap="innerHTML">
Total Requests
</th>
<th class="sortable {% if sort_by == 'first_seen' %}{{ sort_order }}{% endif %}"
hx-get="{{ dashboard_path }}/htmx/attackers?page=1&sort_by=first_seen&sort_order={% if sort_by == 'first_seen' and sort_order == 'desc' %}asc{% else %}desc{% endif %}"
hx-target="closest .htmx-container"
hx-swap="innerHTML">
First Seen</th>
<th class="sortable {% if sort_by == 'last_seen' %}{{ sort_order }}{% endif %}"
hx-get="{{ dashboard_path }}/htmx/attackers?page=1&sort_by=last_seen&sort_order={% if sort_by == 'last_seen' and sort_order == 'desc' %}asc{% else %}desc{% endif %}"
hx-target="closest .htmx-container"
hx-swap="innerHTML">
Last Seen</th>
<th>Location</th>
<th style="width: 40px;"></th>
</tr>
</thead>
<tbody>
{% for ip in items %}
<tr class="ip-row" data-ip="{{ ip.ip | e }}">
<td class="rank">{{ loop.index + (pagination.page - 1) * pagination.page_size }}</td>
<td class="ip-clickable"
hx-get="{{ dashboard_path }}/htmx/ip-detail/{{ ip.ip | e }}"
hx-target="next .ip-stats-row .ip-stats-dropdown"
hx-swap="innerHTML"
@click="toggleIpDetail($event)">
{{ ip.ip | e }}
</td>
<td>{{ ip.total_requests }}</td>
<td>{{ ip.first_seen | format_ts }}</td>
<td>{{ ip.last_seen | format_ts }}</td>
<td>{{ ip.city | default('') | e }}{% if ip.city and ip.country_code %}, {% endif %}{{ ip.country_code | default('N/A') | e }}</td>
<td>
<button class="inspect-btn" @click="openIpInsight('{{ ip.ip | e }}')" title="Inspect IP">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M10.68 11.74a6 6 0 0 1-7.922-8.982 6 6 0 0 1 8.982 7.922l3.04 3.04a.749.749 0 0 1-.326 1.275.749.749 0 0 1-.734-.215ZM11.5 7a4.499 4.499 0 1 0-8.997 0A4.499 4.499 0 0 0 11.5 7Z"/></svg>
</button>
</td>
</tr>
<tr class="ip-stats-row" style="display: none;">
<td colspan="7" class="ip-stats-cell">
<div class="ip-stats-dropdown">
<div class="loading">Loading stats...</div>
</div>
</td>
</tr>
{% else %}
<tr><td colspan="6" class="empty-state">No attackers found</td></tr>
{% endfor %}
</tbody>
</table>

View File

@@ -0,0 +1,66 @@
{# HTMX fragment: Captured Credentials table #}
<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 10px;">
<span class="pagination-info">Page {{ pagination.page }}/{{ pagination.total_pages }} &mdash; {{ pagination.total }} total</span>
<div style="display: flex; gap: 8px;">
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/credentials?page={{ pagination.page - 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page <= 1 %}disabled{% endif %}>Prev</button>
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/credentials?page={{ pagination.page + 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page >= pagination.total_pages %}disabled{% endif %}>Next</button>
</div>
</div>
<table>
<thead>
<tr>
<th>#</th>
<th>IP Address</th>
<th>Username</th>
<th>Password</th>
<th>Path</th>
<th class="sortable {% if sort_by == 'timestamp' %}{{ sort_order }}{% endif %}"
hx-get="{{ dashboard_path }}/htmx/credentials?page=1&sort_by=timestamp&sort_order={% if sort_by == 'timestamp' and sort_order == 'desc' %}asc{% else %}desc{% endif %}"
hx-target="closest .htmx-container"
hx-swap="innerHTML">
Time
</th>
<th style="width: 40px;"></th>
</tr>
</thead>
<tbody>
{% for cred in items %}
<tr class="ip-row" data-ip="{{ cred.ip | e }}">
<td class="rank">{{ loop.index + (pagination.page - 1) * pagination.page_size }}</td>
<td class="ip-clickable"
hx-get="{{ dashboard_path }}/htmx/ip-detail/{{ cred.ip | e }}"
hx-target="next .ip-stats-row .ip-stats-dropdown"
hx-swap="innerHTML"
@click="toggleIpDetail($event)">
{{ cred.ip | e }}
</td>
<td>{{ cred.username | default('N/A') | e }}</td>
<td>{{ cred.password | default('N/A') | e }}</td>
<td>{{ cred.path | default('') | e }}</td>
<td>{{ cred.timestamp | format_ts }}</td>
<td>
<button class="inspect-btn" @click="openIpInsight('{{ cred.ip | e }}')" title="Inspect IP">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M10.68 11.74a6 6 0 0 1-7.922-8.982 6 6 0 0 1 8.982 7.922l3.04 3.04a.749.749 0 0 1-.326 1.275.749.749 0 0 1-.734-.215ZM11.5 7a4.499 4.499 0 1 0-8.997 0A4.499 4.499 0 0 0 11.5 7Z"/></svg>
</button>
</td>
</tr>
<tr class="ip-stats-row" style="display: none;">
<td colspan="7" class="ip-stats-cell">
<div class="ip-stats-dropdown">
<div class="loading">Loading stats...</div>
</div>
</td>
</tr>
{% else %}
<tr><td colspan="6" class="empty-state">No credentials captured</td></tr>
{% endfor %}
</tbody>
</table>

View File

@@ -0,0 +1,60 @@
{# HTMX fragment: Honeypot triggers table #}
<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 10px;">
<span class="pagination-info">Page {{ pagination.page }}/{{ pagination.total_pages }} &mdash; {{ pagination.total }} total</span>
<div style="display: flex; gap: 8px;">
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/honeypot?page={{ pagination.page - 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page <= 1 %}disabled{% endif %}>Prev</button>
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/honeypot?page={{ pagination.page + 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page >= pagination.total_pages %}disabled{% endif %}>Next</button>
</div>
</div>
<table>
<thead>
<tr>
<th>#</th>
<th>IP Address</th>
<th class="sortable {% if sort_by == 'count' %}{{ sort_order }}{% endif %}"
hx-get="{{ dashboard_path }}/htmx/honeypot?page=1&sort_by=count&sort_order={% if sort_by == 'count' and sort_order == 'desc' %}asc{% else %}desc{% endif %}"
hx-target="closest .htmx-container"
hx-swap="innerHTML">
Honeypot Triggers
</th>
<th style="width: 40px;"></th>
</tr>
</thead>
<tbody>
{% for item in items %}
<tr class="ip-row" data-ip="{{ item.ip | e }}">
<td class="rank">{{ loop.index + (pagination.page - 1) * pagination.page_size }}</td>
<td class="ip-clickable"
hx-get="{{ dashboard_path }}/htmx/ip-detail/{{ item.ip | e }}"
hx-target="next .ip-stats-row .ip-stats-dropdown"
hx-swap="innerHTML"
@click="toggleIpDetail($event)">
{{ item.ip | e }}
</td>
<td>{{ item.count }}</td>
<td>
<button class="inspect-btn" @click="openIpInsight('{{ item.ip | e }}')" title="Inspect IP">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M10.68 11.74a6 6 0 0 1-7.922-8.982 6 6 0 0 1 8.982 7.922l3.04 3.04a.749.749 0 0 1-.326 1.275.749.749 0 0 1-.734-.215ZM11.5 7a4.499 4.499 0 1 0-8.997 0A4.499 4.499 0 0 0 11.5 7Z"/></svg>
</button>
</td>
</tr>
<tr class="ip-stats-row" style="display: none;">
<td colspan="4" class="ip-stats-cell">
<div class="ip-stats-dropdown">
<div class="loading">Loading stats...</div>
</div>
</td>
</tr>
{% else %}
<tr><td colspan="3" class="empty-state">No data</td></tr>
{% endfor %}
</tbody>
</table>

View File

@@ -0,0 +1,131 @@
{# HTMX fragment: IP detail expansion row content #}
{# Replaces the ~250 line formatIpStats() JavaScript function #}
<div class="stats-left">
<div class="stat-row">
<span class="stat-label-sm">Total Requests:</span>
<span class="stat-value-sm">{{ stats.total_requests | default('N/A') }}</span>
</div>
<div class="stat-row">
<span class="stat-label-sm">First Seen:</span>
<span class="stat-value-sm">{{ stats.first_seen | format_ts }}</span>
</div>
<div class="stat-row">
<span class="stat-label-sm">Last Seen:</span>
<span class="stat-value-sm">{{ stats.last_seen | format_ts }}</span>
</div>
{% if stats.city or stats.country_code %}
<div class="stat-row">
<span class="stat-label-sm">Location:</span>
<span class="stat-value-sm">{{ stats.city | default('') }}{% if stats.city and stats.country_code %}, {% endif %}{{ stats.country_code | default('') }}</span>
</div>
{% endif %}
{% if stats.reverse_dns %}
<div class="stat-row">
<span class="stat-label-sm">Reverse DNS:</span>
<span class="stat-value-sm">{{ stats.reverse_dns | e }}</span>
</div>
{% endif %}
{% if stats.asn_org %}
<div class="stat-row">
<span class="stat-label-sm">ASN Org:</span>
<span class="stat-value-sm">{{ stats.asn_org | e }}</span>
</div>
{% endif %}
{% if stats.asn %}
<div class="stat-row">
<span class="stat-label-sm">ASN:</span>
<span class="stat-value-sm">{{ stats.asn | e }}</span>
</div>
{% endif %}
{% if stats.isp %}
<div class="stat-row">
<span class="stat-label-sm">ISP:</span>
<span class="stat-value-sm">{{ stats.isp | e }}</span>
</div>
{% endif %}
{# Flags #}
{% set flags = [] %}
{% if stats.is_proxy %}{% set _ = flags.append('Proxy') %}{% endif %}
{% if stats.is_hosting %}{% set _ = flags.append('Hosting') %}{% endif %}
{% if flags %}
<div class="stat-row">
<span class="stat-label-sm">Flags:</span>
<span class="stat-value-sm">{{ flags | join(', ') }}</span>
</div>
{% endif %}
{% if stats.reputation_score is not none %}
<div class="stat-row">
<span class="stat-label-sm">Reputation Score:</span>
<span class="stat-value-sm" style="color: {% if stats.reputation_score <= 30 %}#f85149{% elif stats.reputation_score <= 60 %}#f0883e{% else %}#3fb950{% endif %}">
{{ stats.reputation_score }}/100
</span>
</div>
{% endif %}
{% if stats.category %}
<div class="stat-row">
<span class="stat-label-sm">Category:</span>
<span class="category-badge category-{{ stats.category | lower | replace('_', '-') }}">
{{ stats.category | replace('_', ' ') | title }}
</span>
</div>
{% endif %}
{# Timeline + Reputation section #}
{% if stats.category_history or stats.blocklist_memberships %}
<div class="timeline-section">
<div class="timeline-container">
{# Behavior Timeline #}
{% if stats.category_history %}
<div class="timeline-column">
<div class="timeline-header">Behavior Timeline</div>
<div class="timeline">
{% for entry in stats.category_history %}
<div class="timeline-item">
<div class="timeline-marker {{ entry.new_category | default('unknown') | replace('_', '-') }}"></div>
<div>
<strong>{{ entry.new_category | default('unknown') | replace('_', ' ') | title }}</strong>
{% if entry.old_category %}<span style="color: #8b949e;"> from {{ entry.old_category | replace('_', ' ') | title }}</span>{% endif %}
<br><span style="color: #8b949e; font-size: 11px;">{{ entry.timestamp | format_ts }}</span>
</div>
</div>
{% endfor %}
</div>
</div>
{% endif %}
{# Reputation / Listed On #}
<div class="timeline-column">
<div class="timeline-header">Reputation</div>
{% if stats.blocklist_memberships %}
<div class="reputation-title">Listed On</div>
{% for bl in stats.blocklist_memberships %}
<span class="reputation-badge">{{ bl | e }}</span>
{% endfor %}
{% else %}
<span class="reputation-clean">Clean - Not listed on any blocklists</span>
{% endif %}
</div>
</div>
</div>
{% endif %}
</div>
{# Radar chart (right side) #}
{% if stats.category_scores %}
<div class="stats-right">
<div class="radar-chart" id="radar-{{ stats.ip | default('') | replace('.', '-') | replace(':', '-') }}">
<script>
(function() {
const scores = {{ stats.category_scores | tojson }};
const container = document.getElementById('radar-{{ stats.ip | default("") | replace(".", "-") | replace(":", "-") }}');
if (container && typeof generateRadarChart === 'function') {
container.innerHTML = generateRadarChart(scores, 200);
}
})();
</script>
</div>
</div>
{% endif %}

View File

@@ -0,0 +1,5 @@
{# HTMX fragment: IP Insight - inline display within dashboard tabs #}
<div class="ip-insight-content" id="ip-insight-content">
{% set uid = "insight" %}
{% include "dashboard/partials/_ip_detail.html" %}
</div>

View File

@@ -0,0 +1,38 @@
{# Map section with filter checkboxes #}
<div class="table-container">
<h2>IP Origins Map</h2>
<div style="margin-bottom: 10px; display: flex; gap: 15px; flex-wrap: wrap; align-items: center;">
<label style="display: flex; align-items: center; gap: 6px; cursor: pointer; color: #8b949e; font-size: 13px;">
Show top
<select id="map-ip-limit" onchange="if(typeof reloadMapWithLimit==='function') reloadMapWithLimit(this.value)" style="background: #161b22; color: #c9d1d9; border: 1px solid #30363d; border-radius: 4px; padding: 2px 6px; font-size: 13px; cursor: pointer;">
<option value="10">10</option>
<option value="100" selected>100</option>
<option value="1000">1,000</option>
<option value="all">All</option>
</select>
IPs
</label>
<span style="color: #30363d;">|</span>
<label style="display: flex; align-items: center; gap: 4px; cursor: pointer;">
<input type="checkbox" checked onchange="if(typeof updateMapFilters==='function') updateMapFilters()" class="map-filter" data-category="attacker">
<span style="color: #f85149;">Attackers</span>
</label>
<label style="display: flex; align-items: center; gap: 4px; cursor: pointer;">
<input type="checkbox" checked onchange="if(typeof updateMapFilters==='function') updateMapFilters()" class="map-filter" data-category="bad_crawler">
<span style="color: #f0883e;">Bad Crawlers</span>
</label>
<label style="display: flex; align-items: center; gap: 4px; cursor: pointer;">
<input type="checkbox" checked onchange="if(typeof updateMapFilters==='function') updateMapFilters()" class="map-filter" data-category="good_crawler">
<span style="color: #3fb950;">Good Crawlers</span>
</label>
<label style="display: flex; align-items: center; gap: 4px; cursor: pointer;">
<input type="checkbox" checked onchange="if(typeof updateMapFilters==='function') updateMapFilters()" class="map-filter" data-category="regular_user">
<span style="color: #58a6ff;">Regular Users</span>
</label>
<label style="display: flex; align-items: center; gap: 4px; cursor: pointer;">
<input type="checkbox" checked onchange="if(typeof updateMapFilters==='function') updateMapFilters()" class="map-filter" data-category="unknown">
<span style="color: #8b949e;">Unknown</span>
</label>
</div>
<div id="attacker-map" style="height: 450px; border-radius: 6px; border: 1px solid #30363d;"></div>
</div>

View File

@@ -0,0 +1,43 @@
{# HTMX fragment: Attack Patterns table #}
<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 10px;">
<span class="pagination-info">Page {{ pagination.page }}/{{ pagination.total_pages }} &mdash; {{ pagination.total }} patterns</span>
<div style="display: flex; gap: 8px;">
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/patterns?page={{ pagination.page - 1 }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page <= 1 %}disabled{% endif %}>Prev</button>
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/patterns?page={{ pagination.page + 1 }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page >= pagination.total_pages %}disabled{% endif %}>Next</button>
</div>
</div>
<table>
<thead>
<tr>
<th>#</th>
<th>Attack Pattern</th>
<th>Occurrences</th>
</tr>
</thead>
<tbody>
{% for pattern in items %}
<tr>
<td class="rank">{{ loop.index + (pagination.page - 1) * pagination.page_size }}</td>
<td>
<div class="attack-types-cell">
<span class="attack-types-truncated">{{ pattern.pattern | e }}</span>
{% if pattern.pattern | length > 40 %}
<div class="attack-types-tooltip">{{ pattern.pattern | e }}</div>
{% endif %}
</div>
</td>
<td>{{ pattern.count }}</td>
</tr>
{% else %}
<tr><td colspan="3" class="empty-state">No patterns found</td></tr>
{% endfor %}
</tbody>
</table>

View File

@@ -0,0 +1,20 @@
{# Raw request viewer modal - Alpine.js controlled #}
<div class="raw-request-modal"
x-show="rawModal.show"
x-cloak
@click.self="closeRawModal()"
@keydown.escape.window="closeRawModal()"
>
<div class="raw-request-modal-content">
<div class="raw-request-modal-header">
<h3>Raw HTTP Request</h3>
<span class="raw-request-modal-close" @click="closeRawModal()">&times;</span>
</div>
<div class="raw-request-modal-body">
<pre class="raw-request-content" x-text="rawModal.content"></pre>
</div>
<div class="raw-request-modal-footer">
<button class="raw-request-download-btn" @click="downloadRawRequest()">Download as .txt</button>
</div>
</div>
</div>

View File

@@ -0,0 +1,164 @@
{# HTMX fragment: Search results for attacks and IPs #}
<div class="search-results">
<div class="search-results-header">
<span class="search-results-summary">
Found <strong>{{ pagination.total_attacks }}</strong> attack{{ 's' if pagination.total_attacks != 1 else '' }}
and <strong>{{ pagination.total_ips }}</strong> IP{{ 's' if pagination.total_ips != 1 else '' }}
for &ldquo;<em>{{ query | e }}</em>&rdquo;
</span>
<button class="search-close-btn" onclick="document.getElementById('search-input').value=''; document.getElementById('search-results-container').innerHTML='';">&times;</button>
</div>
{# ── Matching IPs ─────────────────────────────────── #}
{% if ips %}
<div class="search-section">
<h3 class="search-section-title">Matching IPs</h3>
<table>
<thead>
<tr>
<th>#</th>
<th>IP Address</th>
<th>Requests</th>
<th>Category</th>
<th>Location</th>
<th>ISP / ASN</th>
<th>Last Seen</th>
<th style="width: 40px;"></th>
</tr>
</thead>
<tbody>
{% for ip in ips %}
<tr class="ip-row" data-ip="{{ ip.ip | e }}">
<td class="rank">{{ loop.index + (pagination.page - 1) * pagination.page_size }}</td>
<td class="ip-clickable"
hx-get="{{ dashboard_path }}/htmx/ip-detail/{{ ip.ip | e }}"
hx-target="next .ip-stats-row .ip-stats-dropdown"
hx-swap="innerHTML"
@click="toggleIpDetail($event)">
{{ ip.ip | e }}
</td>
<td>{{ ip.total_requests }}</td>
<td>
{% if ip.category %}
<span class="category-badge category-{{ ip.category | default('unknown') | replace('_', '-') | lower }}">
{{ ip.category | e }}
</span>
{% else %}
<span class="category-badge category-unknown">unknown</span>
{% endif %}
</td>
<td>{{ ip.city | default('') | e }}{% if ip.city and ip.country_code %}, {% endif %}{{ ip.country_code | default('N/A') | e }}</td>
<td>{{ ip.isp | default(ip.asn_org | default('N/A')) | e }}</td>
<td>{{ ip.last_seen | format_ts }}</td>
<td>
<button class="inspect-btn" @click="openIpInsight('{{ ip.ip | e }}')" title="Inspect IP">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M10.68 11.74a6 6 0 0 1-7.922-8.982 6 6 0 0 1 8.982 7.922l3.04 3.04a.749.749 0 0 1-.326 1.275.749.749 0 0 1-.734-.215ZM11.5 7a4.499 4.499 0 1 0-8.997 0A4.499 4.499 0 0 0 11.5 7Z"/></svg>
</button>
</td>
</tr>
<tr class="ip-stats-row" style="display: none;">
<td colspan="8" class="ip-stats-cell">
<div class="ip-stats-dropdown">
<div class="loading">Loading stats...</div>
</div>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% endif %}
{# ── Matching Attacks ─────────────────────────────── #}
{% if attacks %}
<div class="search-section">
<h3 class="search-section-title">Matching Attacks</h3>
<table>
<thead>
<tr>
<th>#</th>
<th>IP Address</th>
<th>Path</th>
<th>Attack Types</th>
<th>User-Agent</th>
<th>Time</th>
<th>Actions</th>
</tr>
</thead>
<tbody>
{% for attack in attacks %}
<tr class="ip-row" data-ip="{{ attack.ip | e }}">
<td class="rank">{{ loop.index + (pagination.page - 1) * pagination.page_size }}</td>
<td class="ip-clickable"
hx-get="{{ dashboard_path }}/htmx/ip-detail/{{ attack.ip | e }}"
hx-target="next .ip-stats-row .ip-stats-dropdown"
hx-swap="innerHTML"
@click="toggleIpDetail($event)">
{{ attack.ip | e }}
</td>
<td>
<div class="path-cell-container">
<span class="path-truncated">{{ attack.path | e }}</span>
{% if attack.path | length > 30 %}
<div class="path-tooltip">{{ attack.path | e }}</div>
{% endif %}
</div>
</td>
<td>
<div class="attack-types-cell">
{% set types_str = attack.attack_types | join(', ') %}
<span class="attack-types-truncated">{{ types_str | e }}</span>
{% if types_str | length > 30 %}
<div class="attack-types-tooltip">{{ types_str | e }}</div>
{% endif %}
</div>
</td>
<td>{{ (attack.user_agent | default(''))[:50] | e }}</td>
<td>{{ attack.timestamp | format_ts }}</td>
<td>
{% if attack.log_id %}
<button class="view-btn" @click="viewRawRequest({{ attack.log_id }})">View Request</button>
{% endif %}
</td>
</tr>
<tr class="ip-stats-row" style="display: none;">
<td colspan="7" class="ip-stats-cell">
<div class="ip-stats-dropdown">
<div class="loading">Loading stats...</div>
</div>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% endif %}
{# ── Pagination ───────────────────────────────────── #}
{% if pagination.total_pages > 1 %}
<div class="search-pagination">
<span class="pagination-info">Page {{ pagination.page }}/{{ pagination.total_pages }}</span>
<div style="display: flex; gap: 8px;">
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/search?q={{ query | urlencode }}&page={{ pagination.page - 1 }}"
hx-target="#search-results-container"
hx-swap="innerHTML"
{% if pagination.page <= 1 %}disabled{% endif %}>Prev</button>
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/search?q={{ query | urlencode }}&page={{ pagination.page + 1 }}"
hx-target="#search-results-container"
hx-swap="innerHTML"
{% if pagination.page >= pagination.total_pages %}disabled{% endif %}>Next</button>
</div>
</div>
{% endif %}
{# ── No results ───────────────────────────────────── #}
{% if not attacks and not ips %}
<div class="search-no-results">
No results found for &ldquo;<em>{{ query | e }}</em>&rdquo;
</div>
{% endif %}
</div>

View File

@@ -0,0 +1,31 @@
{# Stats cards - server-rendered on initial page load #}
<div class="stats-grid">
<div class="stat-card">
<div class="stat-value">{{ stats.total_accesses }}</div>
<div class="stat-label">Total Accesses</div>
</div>
<div class="stat-card">
<div class="stat-value">{{ stats.unique_ips }}</div>
<div class="stat-label">Unique IPs</div>
</div>
<div class="stat-card">
<div class="stat-value">{{ stats.unique_paths }}</div>
<div class="stat-label">Unique Paths</div>
</div>
<div class="stat-card alert">
<div class="stat-value alert">{{ stats.suspicious_accesses }}</div>
<div class="stat-label">Suspicious Accesses</div>
</div>
<div class="stat-card alert">
<div class="stat-value alert">{{ stats.honeypot_ips | default(0) }}</div>
<div class="stat-label">Honeypot Caught</div>
</div>
<div class="stat-card alert">
<div class="stat-value alert">{{ stats.credential_count | default(0) }}</div>
<div class="stat-label">Credentials Captured</div>
</div>
<div class="stat-card alert">
<div class="stat-value alert">{{ stats.unique_attackers | default(0) }}</div>
<div class="stat-label">Unique Attackers</div>
</div>
</div>

View File

@@ -0,0 +1,45 @@
{# Recent Suspicious Activity - server-rendered on page load #}
<div class="table-container alert-section">
<h2>Recent Suspicious Activity</h2>
<table>
<thead>
<tr>
<th>IP Address</th>
<th>Path</th>
<th>User-Agent</th>
<th>Time</th>
<th style="width: 40px;"></th>
</tr>
</thead>
<tbody>
{% for activity in suspicious_activities %}
<tr class="ip-row" data-ip="{{ activity.ip | e }}">
<td class="ip-clickable"
hx-get="{{ dashboard_path }}/htmx/ip-detail/{{ activity.ip | e }}"
hx-target="next .ip-stats-row .ip-stats-dropdown"
hx-swap="innerHTML"
@click="toggleIpDetail($event)">
{{ activity.ip | e }}
</td>
<td>{{ activity.path | e }}</td>
<td style="word-break: break-all;">{{ (activity.user_agent | default(''))[:80] | e }}</td>
<td>{{ activity.timestamp | format_ts(time_only=True) }}</td>
<td>
<button class="inspect-btn" @click="openIpInsight('{{ activity.ip | e }}')" title="Inspect IP">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M10.68 11.74a6 6 0 0 1-7.922-8.982 6 6 0 0 1 8.982 7.922l3.04 3.04a.749.749 0 0 1-.326 1.275.749.749 0 0 1-.734-.215ZM11.5 7a4.499 4.499 0 1 0-8.997 0A4.499 4.499 0 0 0 11.5 7Z"/></svg>
</button>
</td>
</tr>
<tr class="ip-stats-row" style="display: none;">
<td colspan="5" class="ip-stats-cell">
<div class="ip-stats-dropdown">
<div class="loading">Loading stats...</div>
</div>
</td>
</tr>
{% else %}
<tr><td colspan="4" class="empty-state">No suspicious activity detected</td></tr>
{% endfor %}
</tbody>
</table>
</div>

View File

@@ -0,0 +1,66 @@
{# HTMX fragment: Top IPs table #}
<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 10px;">
<span class="pagination-info">Page {{ pagination.page }}/{{ pagination.total_pages }} &mdash; {{ pagination.total }} total</span>
<div style="display: flex; gap: 8px;">
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/top-ips?page={{ pagination.page - 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page <= 1 %}disabled{% endif %}>Prev</button>
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/top-ips?page={{ pagination.page + 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page >= pagination.total_pages %}disabled{% endif %}>Next</button>
</div>
</div>
<table>
<thead>
<tr>
<th>#</th>
<th>IP Address</th>
<th>Category</th>
<th class="sortable {% if sort_by == 'count' %}{{ sort_order }}{% endif %}"
hx-get="{{ dashboard_path }}/htmx/top-ips?page=1&sort_by=count&sort_order={% if sort_by == 'count' and sort_order == 'desc' %}asc{% else %}desc{% endif %}"
hx-target="closest .htmx-container"
hx-swap="innerHTML">
Access Count
</th>
<th style="width: 40px;"></th>
</tr>
</thead>
<tbody>
{% for item in items %}
<tr class="ip-row" data-ip="{{ item.ip | e }}">
<td class="rank">{{ loop.index + (pagination.page - 1) * pagination.page_size }}</td>
<td class="ip-clickable"
hx-get="{{ dashboard_path }}/htmx/ip-detail/{{ item.ip | e }}"
hx-target="next .ip-stats-row .ip-stats-dropdown"
hx-swap="innerHTML"
@click="toggleIpDetail($event)">
{{ item.ip | e }}
</td>
<td>
{% set cat = item.category | default('unknown') %}
{% set cat_colors = {'attacker': '#f85149', 'good_crawler': '#3fb950', 'bad_crawler': '#f0883e', 'regular_user': '#58a6ff', 'unknown': '#8b949e'} %}
<span class="category-dot" style="display: inline-block; width: 12px; height: 12px; border-radius: 50%; background: {{ cat_colors.get(cat, '#8b949e') }};" title="{{ cat | replace('_', ' ') | title }}"></span>
</td>
<td>{{ item.count }}</td>
<td>
<button class="inspect-btn" @click="openIpInsight('{{ item.ip | e }}')" title="Inspect IP">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path d="M10.68 11.74a6 6 0 0 1-7.922-8.982 6 6 0 0 1 8.982 7.922l3.04 3.04a.749.749 0 0 1-.326 1.275.749.749 0 0 1-.734-.215ZM11.5 7a4.499 4.499 0 1 0-8.997 0A4.499 4.499 0 0 0 11.5 7Z"/></svg>
</button>
</td>
</tr>
<tr class="ip-stats-row" style="display: none;">
<td colspan="5" class="ip-stats-cell">
<div class="ip-stats-dropdown">
<div class="loading">Loading stats...</div>
</div>
</td>
</tr>
{% else %}
<tr><td colspan="3" class="empty-state">No data</td></tr>
{% endfor %}
</tbody>
</table>

View File

@@ -0,0 +1,41 @@
{# HTMX fragment: Top Paths table #}
<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 10px;">
<span class="pagination-info">Page {{ pagination.page }}/{{ pagination.total_pages }} &mdash; {{ pagination.total }} total</span>
<div style="display: flex; gap: 8px;">
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/top-paths?page={{ pagination.page - 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page <= 1 %}disabled{% endif %}>Prev</button>
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/top-paths?page={{ pagination.page + 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page >= pagination.total_pages %}disabled{% endif %}>Next</button>
</div>
</div>
<table>
<thead>
<tr>
<th>#</th>
<th>Path</th>
<th class="sortable {% if sort_by == 'count' %}{{ sort_order }}{% endif %}"
hx-get="{{ dashboard_path }}/htmx/top-paths?page=1&sort_by=count&sort_order={% if sort_by == 'count' and sort_order == 'desc' %}asc{% else %}desc{% endif %}"
hx-target="closest .htmx-container"
hx-swap="innerHTML">
Access Count
</th>
</tr>
</thead>
<tbody>
{% for item in items %}
<tr>
<td class="rank">{{ loop.index + (pagination.page - 1) * pagination.page_size }}</td>
<td>{{ item.path | e }}</td>
<td>{{ item.count }}</td>
</tr>
{% else %}
<tr><td colspan="3" class="empty-state">No data</td></tr>
{% endfor %}
</tbody>
</table>

View File

@@ -0,0 +1,41 @@
{# HTMX fragment: Top User-Agents table #}
<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 10px;">
<span class="pagination-info">Page {{ pagination.page }}/{{ pagination.total_pages }} &mdash; {{ pagination.total }} total</span>
<div style="display: flex; gap: 8px;">
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/top-ua?page={{ pagination.page - 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page <= 1 %}disabled{% endif %}>Prev</button>
<button class="pagination-btn"
hx-get="{{ dashboard_path }}/htmx/top-ua?page={{ pagination.page + 1 }}&sort_by={{ sort_by }}&sort_order={{ sort_order }}"
hx-target="closest .htmx-container"
hx-swap="innerHTML"
{% if pagination.page >= pagination.total_pages %}disabled{% endif %}>Next</button>
</div>
</div>
<table>
<thead>
<tr>
<th>#</th>
<th>User-Agent</th>
<th class="sortable {% if sort_by == 'count' %}{{ sort_order }}{% endif %}"
hx-get="{{ dashboard_path }}/htmx/top-ua?page=1&sort_by=count&sort_order={% if sort_by == 'count' and sort_order == 'desc' %}asc{% else %}desc{% endif %}"
hx-target="closest .htmx-container"
hx-swap="innerHTML">
Count
</th>
</tr>
</thead>
<tbody>
{% for item in items %}
<tr>
<td class="rank">{{ loop.index + (pagination.page - 1) * pagination.page_size }}</td>
<td>{{ item.user_agent | e }}</td>
<td>{{ item.count }}</td>
</tr>
{% else %}
<tr><td colspan="3" class="empty-state">No data</td></tr>
{% endfor %}
</tbody>
</table>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,181 @@
// Chart.js Attack Types Chart
// Extracted from dashboard_template.py (lines ~3370-3550)
let attackTypesChart = null;
let attackTypesChartLoaded = false;
/**
* Load an attack types doughnut chart into a canvas element.
* @param {string} [canvasId='attack-types-chart'] - Canvas element ID
* @param {string} [ipFilter] - Optional IP address to scope results
* @param {string} [legendPosition='right'] - Legend position
*/
async function loadAttackTypesChart(canvasId, ipFilter, legendPosition) {
canvasId = canvasId || 'attack-types-chart';
legendPosition = legendPosition || 'right';
const DASHBOARD_PATH = window.__DASHBOARD_PATH__ || '';
try {
const canvas = document.getElementById(canvasId);
if (!canvas) return;
let url = DASHBOARD_PATH + '/api/attack-types-stats?limit=10';
if (ipFilter) url += '&ip_filter=' + encodeURIComponent(ipFilter);
const response = await fetch(url, {
cache: 'no-store',
headers: {
'Cache-Control': 'no-cache',
'Pragma': 'no-cache'
}
});
if (!response.ok) throw new Error('Failed to fetch attack types');
const data = await response.json();
const attackTypes = data.attack_types || [];
if (attackTypes.length === 0) {
canvas.parentElement.innerHTML = '<div style="display:flex;align-items:center;justify-content:center;height:100%;color:#8b949e;font-size:13px;">No attack data</div>';
return;
}
const labels = attackTypes.map(item => item.type);
const counts = attackTypes.map(item => item.count);
const maxCount = Math.max(...counts);
// Hash function to generate consistent color from string
function hashCode(str) {
let hash = 0;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash; // Convert to 32bit integer
}
return Math.abs(hash);
}
// Dynamic color generator based on hash
function generateColorFromHash(label) {
const hash = hashCode(label);
const hue = (hash % 360); // 0-360 for hue
const saturation = 70 + (hash % 20); // 70-90 for vibrant colors
const lightness = 50 + (hash % 10); // 50-60 for brightness
const bgColor = `hsl(${hue}, ${saturation}%, ${lightness}%)`;
const borderColor = `hsl(${hue}, ${saturation + 5}%, ${lightness - 10}%)`; // Darker border
const hoverColor = `hsl(${hue}, ${saturation - 10}%, ${lightness + 8}%)`; // Lighter hover
return { bg: bgColor, border: borderColor, hover: hoverColor };
}
// Generate colors dynamically for each attack type
const backgroundColors = labels.map(label => generateColorFromHash(label).bg);
const borderColors = labels.map(label => generateColorFromHash(label).border);
const hoverColors = labels.map(label => generateColorFromHash(label).hover);
// Create or update chart (track per canvas)
if (!loadAttackTypesChart._instances) loadAttackTypesChart._instances = {};
if (loadAttackTypesChart._instances[canvasId]) {
loadAttackTypesChart._instances[canvasId].destroy();
}
const ctx = canvas.getContext('2d');
const chartInstance = new Chart(ctx, {
type: 'doughnut',
data: {
labels: labels,
datasets: [{
data: counts,
backgroundColor: backgroundColors,
borderColor: '#0d1117',
borderWidth: 3,
hoverBorderColor: '#58a6ff',
hoverBorderWidth: 4,
hoverOffset: 10
}]
},
options: {
responsive: true,
maintainAspectRatio: false,
plugins: {
legend: {
position: legendPosition,
labels: {
color: '#c9d1d9',
font: {
size: 12,
weight: '500',
family: "'Segoe UI', Tahoma, Geneva, Verdana"
},
padding: 16,
usePointStyle: true,
pointStyle: 'circle',
generateLabels: (chart) => {
const data = chart.data;
return data.labels.map((label, i) => ({
text: `${label} (${data.datasets[0].data[i]})`,
fillStyle: data.datasets[0].backgroundColor[i],
hidden: false,
index: i,
pointStyle: 'circle'
}));
}
}
},
tooltip: {
enabled: true,
backgroundColor: 'rgba(22, 27, 34, 0.95)',
titleColor: '#58a6ff',
bodyColor: '#c9d1d9',
borderColor: '#58a6ff',
borderWidth: 2,
padding: 14,
titleFont: {
size: 14,
weight: 'bold',
family: "'Segoe UI', Tahoma, Geneva, Verdana"
},
bodyFont: {
size: 13,
family: "'Segoe UI', Tahoma, Geneva, Verdana"
},
caretSize: 8,
caretPadding: 12,
callbacks: {
label: function(context) {
const total = context.dataset.data.reduce((a, b) => a + b, 0);
const percentage = ((context.parsed / total) * 100).toFixed(1);
return `${context.label}: ${percentage}%`;
}
}
}
},
animation: {
enabled: false
},
onHover: (event, activeElements) => {
canvas.style.cursor = activeElements.length > 0 ? 'pointer' : 'default';
}
},
plugins: [{
id: 'customCanvasBackgroundColor',
beforeDraw: (chart) => {
if (chart.ctx) {
chart.ctx.save();
chart.ctx.globalCompositeOperation = 'destination-over';
chart.ctx.fillStyle = 'rgba(0,0,0,0)';
chart.ctx.fillRect(0, 0, chart.width, chart.height);
chart.ctx.restore();
}
}
}]
});
loadAttackTypesChart._instances[canvasId] = chartInstance;
attackTypesChart = chartInstance;
attackTypesChartLoaded = true;
} catch (err) {
console.error('Error loading attack types chart:', err);
}
}

View File

@@ -0,0 +1,164 @@
// Alpine.js Dashboard Application
document.addEventListener('alpine:init', () => {
Alpine.data('dashboardApp', () => ({
// State
tab: 'overview',
dashboardPath: window.__DASHBOARD_PATH__ || '',
// Banlist dropdown
banlistOpen: false,
// Raw request modal
rawModal: { show: false, content: '', logId: null },
// Map state
mapInitialized: false,
// Chart state
chartLoaded: false,
// IP Insight state
insightIp: null,
init() {
// Handle hash-based tab routing
const hash = window.location.hash.slice(1);
if (hash === 'ip-stats' || hash === 'attacks') {
this.switchToAttacks();
}
// ip-insight tab is only accessible via lens buttons, not direct hash navigation
window.addEventListener('hashchange', () => {
const h = window.location.hash.slice(1);
if (h === 'ip-stats' || h === 'attacks') {
this.switchToAttacks();
} else if (h !== 'ip-insight') {
// Don't switch away from ip-insight via hash if already there
if (this.tab !== 'ip-insight') {
this.switchToOverview();
}
}
});
},
switchToAttacks() {
this.tab = 'attacks';
window.location.hash = '#ip-stats';
// Delay chart initialization to ensure the container is visible
this.$nextTick(() => {
setTimeout(() => {
if (!this.chartLoaded && typeof loadAttackTypesChart === 'function') {
loadAttackTypesChart();
this.chartLoaded = true;
}
}, 200);
});
},
switchToOverview() {
this.tab = 'overview';
window.location.hash = '#overview';
},
switchToIpInsight() {
// Only allow switching if an IP is selected
if (!this.insightIp) return;
this.tab = 'ip-insight';
window.location.hash = '#ip-insight';
},
openIpInsight(ip) {
// Set the IP and load the insight content
this.insightIp = ip;
this.tab = 'ip-insight';
window.location.hash = '#ip-insight';
// Load IP insight content via HTMX
this.$nextTick(() => {
const container = document.getElementById('ip-insight-htmx-container');
if (container && typeof htmx !== 'undefined') {
htmx.ajax('GET', `${this.dashboardPath}/htmx/ip-insight/${encodeURIComponent(ip)}`, {
target: '#ip-insight-htmx-container',
swap: 'innerHTML'
});
}
});
},
async viewRawRequest(logId) {
try {
const resp = await fetch(
`${this.dashboardPath}/api/raw-request/${logId}`,
{ cache: 'no-store' }
);
if (resp.status === 404) {
alert('Raw request not available');
return;
}
const data = await resp.json();
this.rawModal.content = data.raw_request || 'No content available';
this.rawModal.logId = logId;
this.rawModal.show = true;
} catch (err) {
alert('Failed to load raw request');
}
},
closeRawModal() {
this.rawModal.show = false;
this.rawModal.content = '';
this.rawModal.logId = null;
},
downloadRawRequest() {
if (!this.rawModal.content) return;
const blob = new Blob([this.rawModal.content], { type: 'text/plain' });
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `raw-request-${this.rawModal.logId || Date.now()}.txt`;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
},
toggleIpDetail(event) {
const row = event.target.closest('tr');
if (!row) return;
const detailRow = row.nextElementSibling;
if (detailRow && detailRow.classList.contains('ip-stats-row')) {
detailRow.style.display =
detailRow.style.display === 'table-row' ? 'none' : 'table-row';
}
},
}));
});
// Global function for opening IP Insight (used by map popups)
window.openIpInsight = function(ip) {
// Find the Alpine component and call openIpInsight
const container = document.querySelector('[x-data="dashboardApp()"]');
if (container) {
// Try Alpine 3.x API first, then fall back to older API
const data = Alpine.$data ? Alpine.$data(container) : (container._x_dataStack && container._x_dataStack[0]);
if (data && typeof data.openIpInsight === 'function') {
data.openIpInsight(ip);
}
}
};
// Utility function for formatting timestamps (used by map popups)
function formatTimestamp(isoTimestamp) {
if (!isoTimestamp) return 'N/A';
try {
const date = new Date(isoTimestamp);
return date.toLocaleString('en-US', {
year: 'numeric', month: '2-digit', day: '2-digit',
hour: '2-digit', minute: '2-digit', second: '2-digit', hour12: false
});
} catch {
return isoTimestamp;
}
}

View File

@@ -0,0 +1,569 @@
// IP Map Visualization
// Extracted from dashboard_template.py (lines ~2978-3348)
let attackerMap = null;
let allIps = [];
let mapMarkers = []; // all marker objects, each tagged with .options.category
let clusterGroup = null; // single shared MarkerClusterGroup
let hiddenCategories = new Set();
const categoryColors = {
attacker: '#f85149',
bad_crawler: '#f0883e',
good_crawler: '#3fb950',
regular_user: '#58a6ff',
unknown: '#8b949e'
};
// Build a conic-gradient pie icon showing the category mix inside a cluster
function createClusterIcon(cluster) {
const children = cluster.getAllChildMarkers();
const counts = {};
children.forEach(m => {
const cat = m.options.category || 'unknown';
counts[cat] = (counts[cat] || 0) + 1;
});
const total = children.length;
const sorted = Object.entries(counts).sort((a, b) => b[1] - a[1]);
let gradientStops = [];
let cumulative = 0;
sorted.forEach(([cat, count]) => {
const start = (cumulative / total) * 360;
cumulative += count;
const end = (cumulative / total) * 360;
const color = categoryColors[cat] || '#8b949e';
gradientStops.push(`${color} ${start.toFixed(1)}deg ${end.toFixed(1)}deg`);
});
const size = Math.max(20, Math.min(44, 20 + Math.log2(total) * 4));
const centerSize = size - 8;
const centerOffset = 4;
const ringWidth = 4;
const radius = (size / 2) - (ringWidth / 2);
const cx = size / 2;
const cy = size / 2;
const gapDeg = 8;
// Build SVG arc segments with gaps - glow layer first, then sharp layer
let glowSegments = '';
let segments = '';
let currentAngle = -90;
sorted.forEach(([cat, count], idx) => {
const sliceDeg = (count / total) * 360;
if (sliceDeg < gapDeg) return;
const startAngle = currentAngle + (gapDeg / 2);
const endAngle = currentAngle + sliceDeg - (gapDeg / 2);
const startRad = (startAngle * Math.PI) / 180;
const endRad = (endAngle * Math.PI) / 180;
const x1 = cx + radius * Math.cos(startRad);
const y1 = cy + radius * Math.sin(startRad);
const x2 = cx + radius * Math.cos(endRad);
const y2 = cy + radius * Math.sin(endRad);
const largeArc = (endAngle - startAngle) > 180 ? 1 : 0;
const color = categoryColors[cat] || '#8b949e';
// Glow layer - subtle
glowSegments += `<path d="M ${x1} ${y1} A ${radius} ${radius} 0 ${largeArc} 1 ${x2} ${y2}" fill="none" stroke="${color}" stroke-width="${ringWidth + 4}" stroke-linecap="round" opacity="0.35" filter="url(#glow)"/>`;
// Sharp layer
segments += `<path d="M ${x1} ${y1} A ${radius} ${radius} 0 ${largeArc} 1 ${x2} ${y2}" fill="none" stroke="${color}" stroke-width="${ringWidth}" stroke-linecap="round"/>`;
currentAngle += sliceDeg;
});
return L.divIcon({
html: `<div style="position:relative;width:${size}px;height:${size}px;">` +
`<svg width="${size}" height="${size}" style="position:absolute;top:0;left:0;overflow:visible;">` +
`<defs><filter id="glow" x="-50%" y="-50%" width="200%" height="200%"><feGaussianBlur stdDeviation="2" result="blur"/></filter></defs>` +
`${glowSegments}${segments}</svg>` +
`<div style="position:absolute;top:${centerOffset}px;left:${centerOffset}px;width:${centerSize}px;height:${centerSize}px;border-radius:50%;background:#0d1117;display:flex;align-items:center;justify-content:center;color:#e6edf3;font-family:'SF Mono',Monaco,Consolas,monospace;font-size:${Math.max(9, centerSize * 0.38)}px;font-weight:600;">${total}</div>` +
`</div>`,
className: 'ip-cluster-icon',
iconSize: L.point(size, size)
});
}
// City coordinates database (major cities worldwide)
const cityCoordinates = {
// United States
'New York': [40.7128, -74.0060], 'Los Angeles': [34.0522, -118.2437],
'San Francisco': [37.7749, -122.4194], 'Chicago': [41.8781, -87.6298],
'Seattle': [47.6062, -122.3321], 'Miami': [25.7617, -80.1918],
'Boston': [42.3601, -71.0589], 'Atlanta': [33.7490, -84.3880],
'Dallas': [32.7767, -96.7970], 'Houston': [29.7604, -95.3698],
'Denver': [39.7392, -104.9903], 'Phoenix': [33.4484, -112.0740],
// Europe
'London': [51.5074, -0.1278], 'Paris': [48.8566, 2.3522],
'Berlin': [52.5200, 13.4050], 'Amsterdam': [52.3676, 4.9041],
'Moscow': [55.7558, 37.6173], 'Rome': [41.9028, 12.4964],
'Madrid': [40.4168, -3.7038], 'Barcelona': [41.3874, 2.1686],
'Milan': [45.4642, 9.1900], 'Vienna': [48.2082, 16.3738],
'Stockholm': [59.3293, 18.0686], 'Oslo': [59.9139, 10.7522],
'Copenhagen': [55.6761, 12.5683], 'Warsaw': [52.2297, 21.0122],
'Prague': [50.0755, 14.4378], 'Budapest': [47.4979, 19.0402],
'Athens': [37.9838, 23.7275], 'Lisbon': [38.7223, -9.1393],
'Brussels': [50.8503, 4.3517], 'Dublin': [53.3498, -6.2603],
'Zurich': [47.3769, 8.5417], 'Geneva': [46.2044, 6.1432],
'Helsinki': [60.1699, 24.9384], 'Bucharest': [44.4268, 26.1025],
'Saint Petersburg': [59.9343, 30.3351], 'Manchester': [53.4808, -2.2426],
'Roubaix': [50.6942, 3.1746], 'Frankfurt': [50.1109, 8.6821],
'Munich': [48.1351, 11.5820], 'Hamburg': [53.5511, 9.9937],
// Asia
'Tokyo': [35.6762, 139.6503], 'Beijing': [39.9042, 116.4074],
'Shanghai': [31.2304, 121.4737], 'Singapore': [1.3521, 103.8198],
'Mumbai': [19.0760, 72.8777], 'Delhi': [28.7041, 77.1025],
'Bangalore': [12.9716, 77.5946], 'Seoul': [37.5665, 126.9780],
'Hong Kong': [22.3193, 114.1694], 'Bangkok': [13.7563, 100.5018],
'Jakarta': [6.2088, 106.8456], 'Manila': [14.5995, 120.9842],
'Hanoi': [21.0285, 105.8542], 'Ho Chi Minh City': [10.8231, 106.6297],
'Taipei': [25.0330, 121.5654], 'Kuala Lumpur': [3.1390, 101.6869],
'Karachi': [24.8607, 67.0011], 'Islamabad': [33.6844, 73.0479],
'Dhaka': [23.8103, 90.4125], 'Colombo': [6.9271, 79.8612],
// South America
'São Paulo': [-23.5505, -46.6333], 'Rio de Janeiro': [-22.9068, -43.1729],
'Buenos Aires': [-34.6037, -58.3816], 'Bogotá': [4.7110, -74.0721],
'Lima': [-12.0464, -77.0428], 'Santiago': [-33.4489, -70.6693],
// Middle East & Africa
'Cairo': [30.0444, 31.2357], 'Dubai': [25.2048, 55.2708],
'Istanbul': [41.0082, 28.9784], 'Tel Aviv': [32.0853, 34.7818],
'Johannesburg': [26.2041, 28.0473], 'Lagos': [6.5244, 3.3792],
'Nairobi': [-1.2921, 36.8219], 'Cape Town': [-33.9249, 18.4241],
// Australia & Oceania
'Sydney': [-33.8688, 151.2093], 'Melbourne': [-37.8136, 144.9631],
'Brisbane': [-27.4698, 153.0251], 'Perth': [-31.9505, 115.8605],
'Auckland': [-36.8485, 174.7633],
// Additional cities
'Unknown': null
};
// Country center coordinates (fallback when city not found)
const countryCoordinates = {
'US': [37.1, -95.7], 'GB': [55.4, -3.4], 'CN': [35.9, 104.1], 'RU': [61.5, 105.3],
'JP': [36.2, 138.3], 'DE': [51.2, 10.5], 'FR': [46.6, 2.2], 'IN': [20.6, 78.96],
'BR': [-14.2, -51.9], 'CA': [56.1, -106.3], 'AU': [-25.3, 133.8], 'MX': [23.6, -102.6],
'ZA': [-30.6, 22.9], 'KR': [35.9, 127.8], 'IT': [41.9, 12.6], 'ES': [40.5, -3.7],
'NL': [52.1, 5.3], 'SE': [60.1, 18.6], 'CH': [46.8, 8.2], 'PL': [51.9, 19.1],
'SG': [1.4, 103.8], 'HK': [22.4, 114.1], 'TW': [23.7, 120.96], 'TH': [15.9, 100.9],
'VN': [14.1, 108.8], 'ID': [-0.8, 113.2], 'PH': [12.9, 121.8], 'MY': [4.2, 101.7],
'PK': [30.4, 69.2], 'BD': [23.7, 90.4], 'NG': [9.1, 8.7], 'EG': [26.8, 30.8],
'TR': [38.9, 35.2], 'IR': [32.4, 53.7], 'AE': [23.4, 53.8], 'KZ': [48.0, 66.9],
'UA': [48.4, 31.2], 'BG': [42.7, 25.5], 'RO': [45.9, 24.97], 'CZ': [49.8, 15.5],
'HU': [47.2, 19.5], 'AT': [47.5, 14.6], 'BE': [50.5, 4.5], 'DK': [56.3, 9.5],
'FI': [61.9, 25.8], 'NO': [60.5, 8.5], 'GR': [39.1, 21.8], 'PT': [39.4, -8.2],
'AR': [-38.4161, -63.6167], 'CO': [4.5709, -74.2973], 'CL': [-35.6751, -71.5430],
'PE': [-9.1900, -75.0152], 'VE': [6.4238, -66.5897], 'LS': [40.0, -100.0]
};
// Helper function to get coordinates for an IP
function getIPCoordinates(ip) {
if (ip.latitude != null && ip.longitude != null) {
return [ip.latitude, ip.longitude];
}
if (ip.city && cityCoordinates[ip.city]) {
return cityCoordinates[ip.city];
}
if (ip.country_code && countryCoordinates[ip.country_code]) {
return countryCoordinates[ip.country_code];
}
return null;
}
// Fetch IPs from the API, handling pagination for "all"
async function fetchIpsForMap(limit) {
const DASHBOARD_PATH = window.__DASHBOARD_PATH__ || '';
const headers = { 'Cache-Control': 'no-cache', 'Pragma': 'no-cache' };
if (limit === 'all') {
// Fetch in pages of 1000 until we have everything
let collected = [];
let page = 1;
const pageSize = 1000;
while (true) {
const response = await fetch(
`${DASHBOARD_PATH}/api/all-ips?page=${page}&page_size=${pageSize}&sort_by=total_requests&sort_order=desc`,
{ cache: 'no-store', headers }
);
if (!response.ok) throw new Error('Failed to fetch IPs');
const data = await response.json();
collected = collected.concat(data.ips || []);
if (page >= data.pagination.total_pages) break;
page++;
}
return collected;
}
const pageSize = parseInt(limit, 10);
const response = await fetch(
`${DASHBOARD_PATH}/api/all-ips?page=1&page_size=${pageSize}&sort_by=total_requests&sort_order=desc`,
{ cache: 'no-store', headers }
);
if (!response.ok) throw new Error('Failed to fetch IPs');
const data = await response.json();
return data.ips || [];
}
// Build markers from an IP list and add them to the map
function buildMapMarkers(ips) {
// Clear existing cluster group
if (clusterGroup) {
attackerMap.removeLayer(clusterGroup);
clusterGroup.clearLayers();
}
mapMarkers = [];
// Single cluster group with custom pie-chart icons
clusterGroup = L.markerClusterGroup({
maxClusterRadius: 35,
spiderfyOnMaxZoom: true,
showCoverageOnHover: false,
zoomToBoundsOnClick: true,
disableClusteringAtZoom: 8,
iconCreateFunction: createClusterIcon
});
// Track used coordinates to add small offsets for overlapping markers
const usedCoordinates = {};
function getUniqueCoordinates(baseCoords) {
const key = `${baseCoords[0].toFixed(4)},${baseCoords[1].toFixed(4)}`;
if (!usedCoordinates[key]) {
usedCoordinates[key] = 0;
}
usedCoordinates[key]++;
if (usedCoordinates[key] === 1) {
return baseCoords;
}
const angle = (usedCoordinates[key] * 137.5) % 360;
const distance = 0.05 * Math.sqrt(usedCoordinates[key]);
const latOffset = distance * Math.cos(angle * Math.PI / 180);
const lngOffset = distance * Math.sin(angle * Math.PI / 180);
return [
baseCoords[0] + latOffset,
baseCoords[1] + lngOffset
];
}
const DASHBOARD_PATH = window.__DASHBOARD_PATH__ || '';
ips.forEach(ip => {
if (!ip.country_code || !ip.category) return;
const baseCoords = getIPCoordinates(ip);
if (!baseCoords) return;
const coords = getUniqueCoordinates(baseCoords);
const category = ip.category.toLowerCase();
if (!categoryColors[category]) return;
const requestsForScale = Math.min(ip.total_requests, 10000);
const sizeRatio = Math.pow(requestsForScale / 10000, 0.5);
const markerSize = Math.max(10, Math.min(30, 10 + (sizeRatio * 20)));
const markerElement = document.createElement('div');
markerElement.className = `ip-marker marker-${category}`;
markerElement.style.width = markerSize + 'px';
markerElement.style.height = markerSize + 'px';
markerElement.style.fontSize = (markerSize * 0.5) + 'px';
markerElement.textContent = '\u25CF';
const marker = L.marker(coords, {
icon: L.divIcon({
html: markerElement.outerHTML,
iconSize: [markerSize, markerSize],
className: `ip-custom-marker category-${category}`
}),
category: category
});
const categoryColor = categoryColors[category] || '#8b949e';
const categoryLabels = {
attacker: 'Attacker',
bad_crawler: 'Bad Crawler',
good_crawler: 'Good Crawler',
regular_user: 'Regular User',
unknown: 'Unknown'
};
marker.bindPopup('', {
maxWidth: 550,
className: 'ip-detail-popup'
});
marker.on('click', async function(e) {
const loadingPopup = `
<div style="padding: 12px; min-width: 280px; max-width: 320px;">
<div style="display: flex; align-items: center; justify-content: space-between; margin-bottom: 8px;">
<strong style="color: #58a6ff; font-size: 14px;">${ip.ip}</strong>
<span style="background: ${categoryColor}1a; color: ${categoryColor}; padding: 2px 8px; border-radius: 12px; font-size: 11px; font-weight: 600;">
${categoryLabels[category]}
</span>
</div>
<div style="text-align: center; padding: 20px; color: #8b949e;">
<div style="font-size: 12px;">Loading details...</div>
</div>
</div>
`;
marker.setPopupContent(loadingPopup);
marker.openPopup();
try {
const response = await fetch(`${DASHBOARD_PATH}/api/ip-stats/${ip.ip}`);
if (!response.ok) throw new Error('Failed to fetch IP stats');
const stats = await response.json();
let popupContent = `
<div style="padding: 12px; min-width: 200px;">
<div style="display: flex; align-items: center; justify-content: space-between; margin-bottom: 4px;">
<strong style="color: #58a6ff; font-size: 14px;">${ip.ip}</strong>
<button onclick="window.openIpInsight('${ip.ip}')" class="inspect-btn" style="display: inline-flex; align-items: center; padding: 4px; background: none; color: #8b949e; border: none; cursor: pointer; border-radius: 4px;" title="Inspect IP">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 16 16" fill="currentColor"><path d="M10.68 11.74a6 6 0 0 1-7.922-8.982 6 6 0 0 1 8.982 7.922l3.04 3.04a.749.749 0 0 1-.326 1.275.749.749 0 0 1-.734-.215ZM11.5 7a4.499 4.499 0 1 0-8.997 0A4.499 4.499 0 0 0 11.5 7Z"/></svg>
</button>
</div>
<div style="margin-bottom: 8px;">
<span style="background: ${categoryColor}1a; color: ${categoryColor}; padding: 2px 8px; border-radius: 12px; font-size: 11px; font-weight: 600;">
${categoryLabels[category]}
</span>
</div>
<span style="color: #8b949e; font-size: 12px;">
${ip.city ? (ip.country_code ? `${ip.city}, ${ip.country_code}` : ip.city) : (ip.country_code || 'Unknown')}
</span><br/>
<div style="margin-top: 8px; border-top: 1px solid #30363d; padding-top: 8px;">
<div style="margin-bottom: 4px;"><span style="color: #8b949e;">Requests:</span> <span style="color: ${categoryColor}; font-weight: bold;">${ip.total_requests}</span></div>
<div style="margin-bottom: 4px;"><span style="color: #8b949e;">First Seen:</span> <span style="color: #58a6ff; font-size: 11px;">${formatTimestamp(ip.first_seen)}</span></div>
<div style="margin-bottom: 4px;"><span style="color: #8b949e;">Last Seen:</span> <span style="color: #58a6ff; font-size: 11px;">${formatTimestamp(ip.last_seen)}</span></div>
</div>
`;
if (stats.category_scores && Object.keys(stats.category_scores).length > 0) {
const chartHtml = generateMapPanelRadarChart(stats.category_scores);
popupContent += `
<div style="margin-top: 12px; border-top: 1px solid #30363d; padding-top: 12px;">
${chartHtml}
</div>
`;
}
popupContent += '</div>';
marker.setPopupContent(popupContent);
} catch (err) {
console.error('Error fetching IP stats:', err);
const errorPopup = `
<div style="padding: 12px; min-width: 280px; max-width: 320px;">
<div style="display: flex; align-items: center; justify-content: space-between; margin-bottom: 4px;">
<strong style="color: #58a6ff; font-size: 14px;">${ip.ip}</strong>
<button onclick="window.openIpInsight('${ip.ip}')" class="inspect-btn" style="display: inline-flex; align-items: center; padding: 4px; background: none; color: #8b949e; border: none; cursor: pointer; border-radius: 4px;" title="Inspect IP">
<svg xmlns="http://www.w3.org/2000/svg" width="14" height="14" viewBox="0 0 16 16" fill="currentColor"><path d="M10.68 11.74a6 6 0 0 1-7.922-8.982 6 6 0 0 1 8.982 7.922l3.04 3.04a.749.749 0 0 1-.326 1.275.749.749 0 0 1-.734-.215ZM11.5 7a4.499 4.499 0 1 0-8.997 0A4.499 4.499 0 0 0 11.5 7Z"/></svg>
</button>
</div>
<div style="margin-bottom: 8px;">
<span style="background: ${categoryColor}1a; color: ${categoryColor}; padding: 2px 8px; border-radius: 12px; font-size: 11px; font-weight: 600;">
${categoryLabels[category]}
</span>
</div>
<span style="color: #8b949e; font-size: 12px;">
${ip.city ? (ip.country_code ? `${ip.city}, ${ip.country_code}` : ip.city) : (ip.country_code || 'Unknown')}
</span><br/>
<div style="margin-top: 8px; border-top: 1px solid #30363d; padding-top: 8px;">
<div style="margin-bottom: 4px;"><span style="color: #8b949e;">Requests:</span> <span style="color: ${categoryColor}; font-weight: bold;">${ip.total_requests}</span></div>
<div style="margin-bottom: 4px;"><span style="color: #8b949e;">First Seen:</span> <span style="color: #58a6ff; font-size: 11px;">${formatTimestamp(ip.first_seen)}</span></div>
<div style="margin-bottom: 4px;"><span style="color: #8b949e;">Last Seen:</span> <span style="color: #58a6ff; font-size: 11px;">${formatTimestamp(ip.last_seen)}</span></div>
</div>
<div style="margin-top: 12px; border-top: 1px solid #30363d; padding-top: 12px; text-align: center; color: #f85149; font-size: 11px;">
Failed to load chart: ${err.message}
</div>
</div>
`;
marker.setPopupContent(errorPopup);
}
});
mapMarkers.push(marker);
// Only add to cluster if category is not hidden
if (!hiddenCategories.has(category)) {
clusterGroup.addLayer(marker);
}
});
attackerMap.addLayer(clusterGroup);
// Fit map to visible markers
const visibleMarkers = mapMarkers.filter(m => !hiddenCategories.has(m.options.category));
if (visibleMarkers.length > 0) {
const bounds = L.featureGroup(visibleMarkers).getBounds();
attackerMap.fitBounds(bounds, { padding: [50, 50] });
}
}
async function initializeAttackerMap() {
const mapContainer = document.getElementById('attacker-map');
if (!mapContainer || attackerMap) return;
try {
attackerMap = L.map('attacker-map', {
center: [20, 0],
zoom: 2,
layers: [
L.tileLayer('https://{s}.basemaps.cartocdn.com/dark_all/{z}/{x}/{y}{r}.png', {
attribution: '&copy; CartoDB | &copy; OpenStreetMap contributors',
maxZoom: 19,
subdomains: 'abcd'
})
]
});
// Get the selected limit from the dropdown (default 100)
const limitSelect = document.getElementById('map-ip-limit');
const limit = limitSelect ? limitSelect.value : '100';
allIps = await fetchIpsForMap(limit);
if (allIps.length === 0) {
mapContainer.innerHTML = '<div style="display: flex; align-items: center; justify-content: center; height: 100%; color: #8b949e;">No IP location data available</div>';
return;
}
buildMapMarkers(allIps);
// Force Leaflet to recalculate container size after the tab becomes visible.
setTimeout(() => {
if (attackerMap) attackerMap.invalidateSize();
}, 300);
} catch (err) {
console.error('Error initializing attacker map:', err);
mapContainer.innerHTML = '<div style="display: flex; align-items: center; justify-content: center; height: 100%; color: #f85149;">Failed to load map: ' + err.message + '</div>';
}
}
// Reload map markers when the user changes the IP limit selector
async function reloadMapWithLimit(limit) {
if (!attackerMap) return;
// Show loading state
const mapContainer = document.getElementById('attacker-map');
const overlay = document.createElement('div');
overlay.id = 'map-loading-overlay';
overlay.style.cssText = 'position:absolute;top:0;left:0;right:0;bottom:0;background:rgba(13,17,23,0.7);display:flex;align-items:center;justify-content:center;z-index:1000;color:#8b949e;font-size:14px;';
overlay.textContent = 'Loading IPs...';
mapContainer.style.position = 'relative';
mapContainer.appendChild(overlay);
try {
allIps = await fetchIpsForMap(limit);
buildMapMarkers(allIps);
} catch (err) {
console.error('Error reloading map:', err);
} finally {
const existing = document.getElementById('map-loading-overlay');
if (existing) existing.remove();
}
}
// Update map filters based on checkbox selection
function updateMapFilters() {
if (!attackerMap || !clusterGroup) return;
hiddenCategories.clear();
document.querySelectorAll('.map-filter').forEach(cb => {
const category = cb.getAttribute('data-category');
if (category && !cb.checked) hiddenCategories.add(category);
});
// Rebuild cluster group with only visible markers
clusterGroup.clearLayers();
const visible = mapMarkers.filter(m => !hiddenCategories.has(m.options.category));
clusterGroup.addLayers(visible);
}
// Generate radar chart SVG for map panel popups
function generateMapPanelRadarChart(categoryScores) {
if (!categoryScores || Object.keys(categoryScores).length === 0) {
return '<div style="color: #8b949e; text-align: center; padding: 20px;">No category data available</div>';
}
let html = '<div style="display: flex; flex-direction: column; align-items: center;">';
html += '<svg class="radar-chart" viewBox="-30 -30 260 260" preserveAspectRatio="xMidYMid meet" style="width: 160px; height: 160px;">';
const scores = {
attacker: categoryScores.attacker || 0,
good_crawler: categoryScores.good_crawler || 0,
bad_crawler: categoryScores.bad_crawler || 0,
regular_user: categoryScores.regular_user || 0,
unknown: categoryScores.unknown || 0
};
const maxScore = Math.max(...Object.values(scores), 1);
const minVisibleRadius = 0.15;
const normalizedScores = {};
Object.keys(scores).forEach(key => {
normalizedScores[key] = minVisibleRadius + (scores[key] / maxScore) * (1 - minVisibleRadius);
});
const colors = {
attacker: '#f85149',
good_crawler: '#3fb950',
bad_crawler: '#f0883e',
regular_user: '#58a6ff',
unknown: '#8b949e'
};
const labels = {
attacker: 'Attacker',
good_crawler: 'Good Bot',
bad_crawler: 'Bad Bot',
regular_user: 'User',
unknown: 'Unknown'
};
const cx = 100, cy = 100, maxRadius = 75;
for (let i = 1; i <= 5; i++) {
const r = (maxRadius / 5) * i;
html += `<circle cx="${cx}" cy="${cy}" r="${r}" fill="none" stroke="#30363d" stroke-width="0.5"/>`;
}
const angles = [0, 72, 144, 216, 288];
const keys = ['good_crawler', 'regular_user', 'unknown', 'bad_crawler', 'attacker'];
angles.forEach((angle, i) => {
const rad = (angle - 90) * Math.PI / 180;
const x2 = cx + maxRadius * Math.cos(rad);
const y2 = cy + maxRadius * Math.sin(rad);
html += `<line x1="${cx}" y1="${cy}" x2="${x2}" y2="${y2}" stroke="#30363d" stroke-width="0.5"/>`;
const labelDist = maxRadius + 35;
const lx = cx + labelDist * Math.cos(rad);
const ly = cy + labelDist * Math.sin(rad);
html += `<text x="${lx}" y="${ly}" fill="#8b949e" font-size="12" text-anchor="middle" dominant-baseline="middle">${labels[keys[i]]}</text>`;
});
let points = [];
angles.forEach((angle, i) => {
const normalizedScore = normalizedScores[keys[i]];
const rad = (angle - 90) * Math.PI / 180;
const r = normalizedScore * maxRadius;
const x = cx + r * Math.cos(rad);
const y = cy + r * Math.sin(rad);
points.push(`${x},${y}`);
});
const dominantKey = Object.keys(scores).reduce((a, b) => scores[a] > scores[b] ? a : b);
const dominantColor = colors[dominantKey];
html += `<polygon points="${points.join(' ')}" fill="${dominantColor}" fill-opacity="0.4" stroke="${dominantColor}" stroke-width="2.5"/>`;
angles.forEach((angle, i) => {
const normalizedScore = normalizedScores[keys[i]];
const rad = (angle - 90) * Math.PI / 180;
const r = normalizedScore * maxRadius;
const x = cx + r * Math.cos(rad);
const y = cy + r * Math.sin(rad);
html += `<circle cx="${x}" cy="${y}" r="4.5" fill="${colors[keys[i]]}" stroke="#0d1117" stroke-width="2"/>`;
});
html += '</svg>';
html += '</div>';
return html;
}

Some files were not shown because too many files have changed in this diff Show More