AI worker fixes (root cause of "nothing reaches Replicate"):
- Worker task died silently — no exception handler around while loop
- Added try/except around entire loop body with exc_info logging
- Added watchdog task that restarts dead workers every 10 seconds
- ensure_workers_alive() called on every /api/ai/assess/batch POST
- _assess_one() is now a top-level function (not closure) — avoids
subtle scoping bugs with async inner functions in while loops
- /api/ai/debug endpoint: shows worker alive status, task exception,
last 10 queue entries — browse to /api/ai/debug to diagnose
- /api/ai/worker/restart endpoint + UI button
- "Restart AI worker" button + "Debug AI queue" link in enrichment tab
site_analyzer.py — new signals:
- IP resolution + ip-api.com for ASN, org, ISP, host country
- EU hosting detection (27 EU + EEA + adequacy countries)
- GDPR: detects Cookiebot, OneTrust, CookiePro, Osano, Iubenda,
Borlabs, CookieYes, Complianz, Usercentrics + text signals
- Privacy policy and GDPR text presence
- Accessibility: html lang missing, images without alt count,
skip nav link, empty links, inputs without labels
Gemini prompt additions:
- Hosting section: IP, ASN, org/ISP, EU vs non-EU flag
- GDPR section: cookie tool, notice, privacy policy
- Accessibility section: all quick-scan results
- New output fields: hosting_notes, gdpr_compliance,
accessibility_issues[]
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Build /data/domains.duckdb on first run (tld+parts columns + ART index)
→ TLD filter goes from ~60s full scan to <100ms index lookup
→ System still works (slower) while index builds in background
- New /api/domains params: alpha_only, no_sld, keyword
→ alpha_only: domains with only letters (no hyphens/numbers)
→ no_sld: parts=2, excludes com.es / net.es patterns
→ keyword: LIKE '%term%' niche search
- /api/domains and /api/enriched now return total count for pagination
- Pagination: shows total matches, page X of Y, Next disabled at last page
- Enrich button: toast notifications instead of alert(), error handling
- Select all on page button, clear selection button
- Stats/TLD breakdown cached after first load (no repeat full scan)
- Header shows index build status (building → ready)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FastAPI backend with DuckDB pushdown queries on 72M parquet
- Async enrichment worker: HTTP, SSL, DNS MX, CMS fingerprint, ip-api.com
- Resumable parquet download with HTTP Range support
- Lead scoring engine (max 100 pts, target countries ES,GB,DE,FR,RO,PT,AD,IT)
- Single-file Alpine.js + Chart.js dashboard on port 6677
- SQLite enrichment DB with job queue and scores tables
- Dockerized with persistent /data volume
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>