Port 80 is often firewalled (drops packets → ConnectTimeout) rather than
refused (ConnectError). Previously ConnectTimeout hit the generic except
branch and broke without trying https, marking everything dead.
Now ConnectError + RemoteProtocolError + ConnectTimeout all trigger an
https retry. ReadTimeout still marks dead (server responded on connect
but was too slow).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rotate across 7 real browser UAs to avoid bot detection
- Any 2xx/3xx/4xx/5xx response = server is UP = live (only no-response = dead)
- Parking signals still checked on 200/203 body content
- Previous 403/404 responses were incorrectly marking live servers as dead
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previous fix only retried on ConnectError. Servers that accept TCP on port 80
but hang, return protocol errors, or timeout also need the https fallback.
Now any exception on http triggers https retry. Shorter http timeout (4s)
avoids wasting time on non-responsive port 80.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds rescan_dead flag that causes _filter_unvalidated to treat
previously-dead domains as needing a fresh check. Useful after
fixing the http/https detection bug.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Many modern servers refuse HTTP connections entirely. The validator was
only trying http://, causing HTTPS-only sites to be wrongly marked dead.
Now falls back to https:// on ConnectError. Also increased timeouts slightly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>