DomGod

Author	SHA1	Message	Date
Malin	468d76387d	fix: 429 retry, sequential batching, force UI refresh after prescreen 1. prescreener.py: classify_with_deepseek now retries on 429 with exponential back-off (5s → 10s → 20s → 40s, up to 4 attempts); same back-off also covers other transient errors. 2. main.py: prescreen batches run sequentially with a 3s gap instead of asyncio.gather (parallel). Parallel batches caused the second batch to always hit the 429 rate limit, leaving most domains unclassified (only the smaller last batch succeeded). 3. index.html: prescreenSelected() now clears this.domains before calling _fetch() so Alpine re-renders the full table with the updated niche/type values; also updates the notify hint to mention the expected 1-2 min wait. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 21:52:39 +02:00
Malin	a30085975e	fix: poll Replicate for DeepSeek-R1 async predictions (202 Accepted) DeepSeek-R1 is too slow for synchronous Replicate wait; it returns 202 with a prediction URL instead of the completed output. Added polling loop: - POST with Prefer: wait=60 - If 202 or status=starting/processing, poll urls.get every 2s up to 90× (~3 min ceiling) - On succeeded, use the final response data as normal - On failed/canceled/timeout, log and return [] Also guards against output=None before calling str.join(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 21:43:13 +02:00
Malin	a0c9db1ef2	fix: DeepSeek niche/type not saving to DB Two bugs: 1. _parse_classify_output stripped <think> block before searching for JSON. DeepSeek-R1 often puts the JSON array inside the think block (especially when it "decides" mid-reasoning), so stripping it first destroyed the data. Fix: search full output first, then inside <think>, then stripped — three fallback strategies with info logging at each step. 2. Phase 2 save used bare UPDATE WHERE domain=? which silently does nothing if the domain row doesn't exist yet in enriched_domains. Fix: replace with INSERT ... ON CONFLICT DO UPDATE (true upsert). Also adds logger.info lines so container logs show raw DeepSeek output and parse result count for easy debugging. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 21:35:49 +02:00
Malin	7fc510f903	feat: two-phase pre-screening with HTTP check + DeepSeek batch classification Phase 1 (no AI credits): httpx checks every selected domain concurrently (30 parallel) with real browser UA — detects live/dead/parked/redirect. Parked: keyword scan in body/title + known parking host redirect check. Results saved to DB immediately; dead/parked never reach DeepSeek. Phase 2 (single DeepSeek call): all live-site titles + snippets bundled into ONE Replicate/DeepSeek-R1 request → returns niche + type for every domain in batch (up to 80 per call, parallelised if more). - app/prescreener.py (new): _check_one(), prescreen_domains(), classify_with_deepseek(), parking signal lists, same-domain redirect logic - app/db.py: prescreen_status/niche/site_type/prescreen_at columns + migrations; save_prescreen_results() upsert helper - app/main.py: POST /api/prescreen/batch endpoint - app/static/index.html: - 🔍 Pre-screen button (disabled while running, shows spinner) - Niche + Type columns in Browse and Leads tables (.pni/.pty pills) - Prescreen status colour dot (●) when niche not yet set - prescreening state flag; result toast shows per-status counts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 21:22:45 +02:00

4 Commits