- Implement export-malicious-ips task that queries distinct IPs flagged
as is_suspicious from database and writes to exports/malicious_ips.txt
- Add exports volume mount to docker-compose.yaml for host persistence
- Update entrypoint.sh to fix ownership of exports directory for krawl user
- Update Dockerfile to create /app/exports directory during build
Other tasks can be added by creating them in the tasks dir using the same setup as this task.
All tasks *MUST* include a TASK_CONFIG dict and a main method in the file to work correctly.
- Add SQLAlchemy-based database layer for persistent storage
- Create models for access_logs, credential_attempts, attack_detections, ip_stats
- Include fields for future GeoIP and reputation enrichment
- Implement sanitization utilities to protect against malicious payloads
- Fix XSS vulnerability in dashboard template (HTML escape all user data)
- Add DATABASE_PATH and DATABASE_RETENTION_DAYS config options
- Dual storage: in-memory for dashboard performance + SQLite for persistence
New files:
- src/models.py - SQLAlchemy ORM models
- src/database.py - DatabaseManager singleton
- src/sanitizer.py - Input sanitization and HTML escaping
- requirements.txt - SQLAlchemy dependency
Security protections:
- Parameterized queries via SQLAlchemy ORM
- Field length limits to prevent storage exhaustion
- Null byte and control character stripping
- HTML escaping on dashboard output
Implement a centralized logging singleton using Python's built-in
logging module with RotatingFileHandler. Replaces all print()
statements with structured logging.
- Create LoggerManager singleton in src/logger.py
- Add two loggers: app (krawl.log) and access (access.log)
- Configure 1MB file rotation with 5 backups
- Output to both files and stdout for container compatibility
- Update handler.py, server.py, wordlists.py to use new loggers
Benefits over print():
- Persistent logs survive restarts for forensic analysis
- Automatic rotation prevents unbounded disk growth
- Separate access/app logs for easier analysis and SIEM integration
- Consistent timestamps and log levels across all messages
- Configurable verbosity without code changes
Add SERVER_HEADER environment variable to customize the HTTP Server
response header, defaulting to Apache/2.2.22 (Ubuntu). This allows the
honeypot to masquerade as different web servers to attract attackers.
- Add server_header field to Config dataclass
- Override version_string() in Handler to return configured header
- Update documentation and all deployment configs