feat: Integrate tqdm for progress tracking during bot list fetching
feat: Add regex filters to exclude IP addresses and domains from bot lists
refactor: Remove IP and domain-specific sources from BOT_LIST_SOURCES
refactor: Update parse_bot_list to skip lines matching IP or domain regex
refactor: Improve logging for better debugging and clarity
refactor: Use ThreadPoolExecutor with tqdm for concurrent fetching with progress
docs: Add comments and docstrings for better code understanding
chore: Ensure output directories exist before generating WAF configurations
* feat: add CLI support for output file and Git reference
* feat: implement atomic file writes for saving JSON
* feat: add dry-run mode to simulate fetching without saving
* feat: increase connection pool size to avoid "Connection pool is full" warnings
* feat: add progress bar for fetching and processing rule files
* feat: add retries for SHA verification in case of transient errors
* refactor: improve error handling for connection pool-related errors
* refactor: use ThreadPoolExecutor for parallel fetching of rule files
* refactor: improve logging with structured messages
* fix: handle edge cases in tag fetching logic
* fix: handle empty blob content gracefully
* fix: improve SHA verification logging
* docs: add comments and docstrings for better code readability
* chore: update requirements.txt to include tqdm
* test: add unit tests for critical functions
- Error Handling:
- Added error handling for file operations, JSON parsing, and invalid rule structures.
- Logs warnings for invalid rules instead of crashing.
- Path Handling:
- Used pathlib.Path for better path manipulation and readability.
Made paths configurable via environment variables.
- Logging:
- Replaced print() with Python's logging module for more flexible and structured logging.
- Input Validation:
- Added checks for missing keys in the input JSON file.
- Rule Formatting:
- Ensured proper formatting of HAProxy ACL rules.
- Output Directory Permissions:
- Ensured the output directory is created with parents=True to handle nested directories.
- Code Structure:
- Encapsulated the main logic in a main() function for better organization.
- Added docstrings to functions for clarity.
- Error Handling: Added try-except blocks to handle file operations, subprocess commands, and permission issues. Logs detailed error messages for debugging.
- Path Handling: Used pathlib.Path for better path manipulation and readability. Made paths configurable via environment variables.
- File Permissions: Ensured the target directory is created with parents=True to handle nested directories. Checked if files already exist in the target directory to avoid unnecessary overwrites.
- Logging: Added more detailed logging for better transparency and debugging.
- Subprocess Security: Added checks for apachectl and systemctl commands to ensure compatibility with supported systems.
- Input Validation: Validated the existence of .conf files before copying them.
- Code Structure: Encapsulated the main logic in a main() function for better organization.
- Added docstrings to functions for clarity.
- Error Handling: Added error handling for file operations, JSON parsing, and invalid rule structures. Logs warnings for invalid rules instead of crashing.
- Unique Rule IDs: Each rule is assigned a unique id to avoid collisions in ModSecurity.
- Path Handling: Used pathlib.Path for better path manipulation and readability.
- Logging: Replaced print() with Python's logging module for more flexible and structured logging.
- Input Validation: Added checks for missing keys in the input JSON file.
- Template for Rules: Used a template string (MODSEC_RULE_TEMPLATE) for consistent rule formatting.
- Output Directory Permissions: Ensured the output directory is created with parents=True to handle nested directories.