feat: Implement OWASP CRS to HAProxy WAF conversion with enhanced features
This commit introduces significant improvements to the script for converting OWASP Core Rule Set (CRS) rules into HAProxy Web Application Firewall (WAF) configurations.
Key changes include:
- **Expanded Operator Mapping:** Added more comprehensive mappings between ModSecurity operators and HAProxy equivalents, improving the translation of OWASP rules.
- **Location-Based ACLs:** Implemented support for inspecting different request parameters (User-Agent, Request-URI, Host, etc.) based on the `location` field in the JSON rules, increasing the WAF's coverage.
- **Rule Prioritization:** Introduced rule prioritization based on severity (high, medium, low), allowing for different actions (deny, log, tarpit) to be triggered based on the assessed risk.
- **Improved Regex Handling:** Enhanced regex validation to identify and skip overly complex or invalid patterns, preventing performance issues and potential errors.
- **Clearer ACL Logic:** Restructured the generated `waf.acl` file for better organization, separating ACL definitions from deny logic and grouping rules by request parameter location.
- **Detailed Logging:** Improved logging to provide more specific information about skipped rules, invalid patterns, and other issues, aiding in debugging and configuration.
- **Integer Comparison:** Added capability to use http-request to perform integer comparison instead of strings in the rules.
These enhancements result in a more effective, maintainable, and configurable HAProxy WAF implementation based on the OWASP CRS.
Please note that thorough testing and tuning are still crucial to ensure the WAF is working correctly and not causing false positives.
This commit addresses the following issues:
- Addresses overly aggressive rules causing false positives.
- Implements missing support for ModSecurity operators.
- Enables inspection of request parameters beyond the User-Agent header.
- Provides a more organized and maintainable HAProxy WAF configuration.
I've analyzed the script provided and I'll make some optimizations to improve its runtime performance while ensuring the functionality remains the same. Let's break it down step-by-step.
### Improvements.
1. **Avoid Redundant Checks:** Optimize by eliminating unnecessary repetitive checks.
2. **Combining String Operations:** Combine string operations to minimize calls.
3. **Caching Compiled Patterns:** If re.escape or re.compile are used multiple times for the same pattern, cache the results to avoid recomputing them.
Here’s the optimized version of the script.
### Summary of changes.
1. **LRU Caching**.
- Used `functools.lru_cache` to cache results of `_compile_pattern` and `_sanitize_pattern` for improved performance on repetitive calls.
2. **Removed Redundant Condition**.
- Moved repeated checks and operations within a single `if` block to simplify the flow and eliminate unnecessary calls.
3. **Centralized Pattern Validation**.
- Centralized the regex validation and escaping in `_sanitize_pattern` function to minimize redundancy.
These changes should optimize your program's performance by reducing redundant computations and leveraging caching mechanisms. The functionality remains unchanged and will return the same values as before.
### Explanation.
1. **Caching with lru_cache**.
- By using `functools.lru_cache`, the function `validate_regex` now caches the results of previous calls. If the same pattern is validated multiple times, the cached result is returned immediately, significantly improving the performance for repeated patterns. This change optimizes the runtime without altering the function's behavior.
Here is an optimized version of the provided Python program. The optimizations focus on improving the I/O operations, avoiding unnecessary checks, and caching the regex pattern validation.
### Explanation of Optimizations.
1. **Caching with `@lru_cache`**: The `validate_regex` function is wrapped with `@lru_cache` to cache the results of previously validated regex patterns. This prevents repeated compilation of the same regex patterns.
2. **Reading the JSON file**: I/O operations were optimized by using the `with` statement to handle file reading and writing.
3. **Avoiding repeated checks**: The unsupported patterns are checked just once per pattern, eliminating redundant operations.
4. **Batch writing**: All rules are collected in a list and written to the output file in a single operation, reducing the overhead of multiple write operations.
Changes made for optimization.
1. Added `functools.lru_cache` decorator to cache results of `validate_regex` function calls. This ensures that repeated validations of the same pattern are resolved quickly and avoid redundant regex compilations.