Two bugs:
1. _parse_classify_output stripped <think> block before searching for JSON.
DeepSeek-R1 often puts the JSON array inside the think block (especially
when it "decides" mid-reasoning), so stripping it first destroyed the data.
Fix: search full output first, then inside <think>, then stripped — three
fallback strategies with info logging at each step.
2. Phase 2 save used bare UPDATE WHERE domain=? which silently does nothing
if the domain row doesn't exist yet in enriched_domains.
Fix: replace with INSERT ... ON CONFLICT DO UPDATE (true upsert).
Also adds logger.info lines so container logs show raw DeepSeek output
and parse result count for easy debugging.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 1 (no AI credits): httpx checks every selected domain concurrently
(30 parallel) with real browser UA — detects live/dead/parked/redirect.
Parked: keyword scan in body/title + known parking host redirect check.
Results saved to DB immediately; dead/parked never reach DeepSeek.
Phase 2 (single DeepSeek call): all live-site titles + snippets bundled
into ONE Replicate/DeepSeek-R1 request → returns niche + type for every
domain in batch (up to 80 per call, parallelised if more).
- app/prescreener.py (new): _check_one(), prescreen_domains(),
classify_with_deepseek(), parking signal lists, same-domain redirect logic
- app/db.py: prescreen_status/niche/site_type/prescreen_at columns +
migrations; save_prescreen_results() upsert helper
- app/main.py: POST /api/prescreen/batch endpoint
- app/static/index.html:
- 🔍 Pre-screen button (disabled while running, shows spinner)
- Niche + Type columns in Browse and Leads tables (.pni/.pty pills)
- Prescreen status colour dot (●) when niche not yet set
- prescreening state flag; result toast shows per-status counts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>