feat: Integrate tqdm for progress tracking during bot list fetching
feat: Add regex filters to exclude IP addresses and domains from bot lists
refactor: Remove IP and domain-specific sources from BOT_LIST_SOURCES
refactor: Update parse_bot_list to skip lines matching IP or domain regex
refactor: Improve logging for better debugging and clarity
refactor: Use ThreadPoolExecutor with tqdm for concurrent fetching with progress
docs: Add comments and docstrings for better code understanding
chore: Ensure output directories exist before generating WAF configurations
* feat: add CLI support for output file and Git reference
* feat: implement atomic file writes for saving JSON
* feat: add dry-run mode to simulate fetching without saving
* feat: increase connection pool size to avoid "Connection pool is full" warnings
* feat: add progress bar for fetching and processing rule files
* feat: add retries for SHA verification in case of transient errors
* refactor: improve error handling for connection pool-related errors
* refactor: use ThreadPoolExecutor for parallel fetching of rule files
* refactor: improve logging with structured messages
* fix: handle edge cases in tag fetching logic
* fix: handle empty blob content gracefully
* fix: improve SHA verification logging
* docs: add comments and docstrings for better code readability
* chore: update requirements.txt to include tqdm
* test: add unit tests for critical functions
- Error Handling:
- Added error handling for file operations, JSON parsing, and invalid rule structures.
- Logs warnings for invalid rules instead of crashing.
- Path Handling:
- Used pathlib.Path for better path manipulation and readability.
Made paths configurable via environment variables.
- Logging:
- Replaced print() with Python's logging module for more flexible and structured logging.
- Input Validation:
- Added checks for missing keys in the input JSON file.
- Rule Formatting:
- Ensured proper formatting of HAProxy ACL rules.
- Output Directory Permissions:
- Ensured the output directory is created with parents=True to handle nested directories.
- Code Structure:
- Encapsulated the main logic in a main() function for better organization.
- Added docstrings to functions for clarity.