Configuration override from environment variable (#47)

* Add environment variable override for config fields

Introduces functions to override configuration fields from environment variables, allowing dynamic configuration without modifying YAML files. The environment variable names are generated from field names, and type conversion is handled for int, float, and tuple fields.

* update chart version to 0.1.4

* Update README.md to enhance environment variable configuration details and improve overall clarity
This commit is contained in:
Lorenzo Venerandi
2026-01-23 17:34:23 +01:00
committed by GitHub
parent e1444e44ee
commit 223883a781
3 changed files with 401 additions and 326 deletions

View File

@@ -171,23 +171,71 @@ To access the dashboard
## Configuration via Environment Variables ## Configuration via Environment Variables
To customize the deception server installation several **environment variables** can be specified. To customize the deception server installation, environment variables can be specified using the naming convention: `KRAWL_<FIELD_NAME>` where `<FIELD_NAME>` is the configuration field name in uppercase with special characters converted:
- `.``_`
- `-``__` (double underscore)
- ` ` (space) → `_`
| Variable | Description | Default | ### Configuration Variables
|----------|-------------|---------|
| `PORT` | Server listening port | `5000` | | Configuration Field | Environment Variable | Description | Default |
| `DELAY` | Response delay in milliseconds | `100` | |-----------|-----------|-------------|---------|
| `LINKS_MIN_LENGTH` | Minimum random link length | `5` | | `port` | `KRAWL_PORT` | Server listening port | `5000` |
| `LINKS_MAX_LENGTH` | Maximum random link length | `15` | | `delay` | `KRAWL_DELAY` | Response delay in milliseconds | `100` |
| `LINKS_MIN_PER_PAGE` | Minimum links per page | `10` | | `server_header` | `KRAWL_SERVER_HEADER` | HTTP Server header for deception | `""` |
| `LINKS_MAX_PER_PAGE` | Maximum links per page | `15` | | `links_length_range` | `KRAWL_LINKS_LENGTH_RANGE` | Link length range as `min,max` | `5,15` |
| `MAX_COUNTER` | Initial counter value | `10` | | `links_per_page_range` | `KRAWL_LINKS_PER_PAGE_RANGE` | Links per page as `min,max` | `10,15` |
| `CANARY_TOKEN_TRIES` | Requests before showing canary token | `10` | | `char_space` | `KRAWL_CHAR_SPACE` | Characters used for link generation | `abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789` |
| `CANARY_TOKEN_URL` | External canary token URL | None | | `max_counter` | `KRAWL_MAX_COUNTER` | Initial counter value | `10` |
| `DASHBOARD_SECRET_PATH` | Custom dashboard path | Auto-generated | | `canary_token_url` | `KRAWL_CANARY_TOKEN_URL` | External canary token URL | None |
| `PROBABILITY_ERROR_CODES` | Error response probability (0-100%) | `0` | | `canary_token_tries` | `KRAWL_CANARY_TOKEN_TRIES` | Requests before showing canary token | `10` |
| `SERVER_HEADER` | HTTP Server header for deception | `Apache/2.2.22 (Ubuntu)` | | `dashboard_secret_path` | `KRAWL_DASHBOARD_SECRET_PATH` | Custom dashboard path | Auto-generated |
| `TIMEZONE` | IANA timezone for logs and dashboard (e.g., `America/New_York`, `Europe/Rome`) | System timezone | | `api_server_url` | `KRAWL_API_SERVER_URL` | API server URL | None |
| `api_server_port` | `KRAWL_API_SERVER_PORT` | API server port | `8080` |
| `api_server_path` | `KRAWL_API_SERVER_PATH` | API server endpoint path | `/api/v2/users` |
| `probability_error_codes` | `KRAWL_PROBABILITY_ERROR_CODES` | Error response probability (0-100%) | `0` |
| `database_path` | `KRAWL_DATABASE_PATH` | Database file location | `data/krawl.db` |
| `database_retention_days` | `KRAWL_DATABASE_RETENTION_DAYS` | Days to retain data in database | `30` |
| `http_risky_methods_threshold` | `KRAWL_HTTP_RISKY_METHODS_THRESHOLD` | Threshold for risky HTTP methods detection | `0.1` |
| `violated_robots_threshold` | `KRAWL_VIOLATED_ROBOTS_THRESHOLD` | Threshold for robots.txt violations | `0.1` |
| `uneven_request_timing_threshold` | `KRAWL_UNEVEN_REQUEST_TIMING_THRESHOLD` | Coefficient of variation threshold for timing | `0.5` |
| `uneven_request_timing_time_window_seconds` | `KRAWL_UNEVEN_REQUEST_TIMING_TIME_WINDOW_SECONDS` | Time window for request timing analysis in seconds | `300` |
| `user_agents_used_threshold` | `KRAWL_USER_AGENTS_USED_THRESHOLD` | Threshold for detecting multiple user agents | `2` |
| `attack_urls_threshold` | `KRAWL_ATTACK_URLS_THRESHOLD` | Threshold for attack URL detection | `1` |
### Examples
```bash
# Set port and delay
export KRAWL_PORT=8080
export KRAWL_DELAY=200
# Set canary token
export KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url"
# Set tuple values (min,max format)
export KRAWL_LINKS_LENGTH_RANGE="3,20"
export KRAWL_LINKS_PER_PAGE_RANGE="5,25"
# Set analyzer thresholds
export KRAWL_HTTP_RISKY_METHODS_THRESHOLD="0.2"
export KRAWL_VIOLATED_ROBOTS_THRESHOLD="0.15"
# Set custom dashboard path
export KRAWL_DASHBOARD_SECRET_PATH="/my-secret-dashboard"
```
Or in Docker:
```bash
docker run -d \
-p 5000:5000 \
-e KRAWL_PORT=5000 \
-e KRAWL_DELAY=100 \
-e KRAWL_CANARY_TOKEN_URL="http://your-canary-token-url" \
--name krawl \
ghcr.io/blessedrebus/krawl:latest
```
## robots.txt ## robots.txt
The actual (juicy) robots.txt configuration is the following The actual (juicy) robots.txt configuration is the following

View File

@@ -2,7 +2,7 @@ apiVersion: v2
name: krawl-chart name: krawl-chart
description: A Helm chart for Krawl honeypot server description: A Helm chart for Krawl honeypot server
type: application type: application
version: 0.1.3 version: 0.1.4
appVersion: 0.1.6 appVersion: 0.1.6
keywords: keywords:
- honeypot - honeypot

View File

@@ -111,13 +111,40 @@ class Config:
attack_urls_threshold=analyzer.get('attack_urls_threshold', 1) attack_urls_threshold=analyzer.get('attack_urls_threshold', 1)
) )
def __get_env_from_config(config: str) -> str:
env = config.upper().replace('.', '_').replace('-', '__').replace(' ', '_')
return f'KRAWL_{env}'
def override_config_from_env(config: Config = None):
"""Initialize configuration from environment variables"""
for field in config.__dataclass_fields__:
env_var = __get_env_from_config(field)
if env_var in os.environ:
field_type = config.__dataclass_fields__[field].type
env_value = os.environ[env_var]
if field_type == int:
setattr(config, field, int(env_value))
elif field_type == float:
setattr(config, field, float(env_value))
elif field_type == Tuple[int, int]:
parts = env_value.split(',')
if len(parts) == 2:
setattr(config, field, (int(parts[0]), int(parts[1])))
else:
setattr(config, field, env_value)
_config_instance = None _config_instance = None
def get_config() -> Config: def get_config() -> Config:
"""Get the singleton Config instance""" """Get the singleton Config instance"""
global _config_instance global _config_instance
if _config_instance is None: if _config_instance is None:
_config_instance = Config.from_yaml() _config_instance = Config.from_yaml()
override_config_from_env(_config_instance)
return _config_instance return _config_instance