Commit Graph

4 Commits

Author SHA1 Message Date
468d76387d fix: 429 retry, sequential batching, force UI refresh after prescreen
1. prescreener.py: classify_with_deepseek now retries on 429 with
   exponential back-off (5s → 10s → 20s → 40s, up to 4 attempts);
   same back-off also covers other transient errors.

2. main.py: prescreen batches run sequentially with a 3s gap instead
   of asyncio.gather (parallel). Parallel batches caused the second
   batch to always hit the 429 rate limit, leaving most domains
   unclassified (only the smaller last batch succeeded).

3. index.html: prescreenSelected() now clears this.domains before
   calling _fetch() so Alpine re-renders the full table with the
   updated niche/type values; also updates the notify hint to mention
   the expected 1-2 min wait.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:52:39 +02:00
a30085975e fix: poll Replicate for DeepSeek-R1 async predictions (202 Accepted)
DeepSeek-R1 is too slow for synchronous Replicate wait; it returns 202
with a prediction URL instead of the completed output. Added polling loop:
- POST with Prefer: wait=60
- If 202 or status=starting/processing, poll urls.get every 2s up to 90×
  (~3 min ceiling)
- On succeeded, use the final response data as normal
- On failed/canceled/timeout, log and return []
Also guards against output=None before calling str.join().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:43:13 +02:00
a0c9db1ef2 fix: DeepSeek niche/type not saving to DB
Two bugs:
1. _parse_classify_output stripped <think> block before searching for JSON.
   DeepSeek-R1 often puts the JSON array inside the think block (especially
   when it "decides" mid-reasoning), so stripping it first destroyed the data.
   Fix: search full output first, then inside <think>, then stripped — three
   fallback strategies with info logging at each step.

2. Phase 2 save used bare UPDATE WHERE domain=? which silently does nothing
   if the domain row doesn't exist yet in enriched_domains.
   Fix: replace with INSERT ... ON CONFLICT DO UPDATE (true upsert).

Also adds logger.info lines so container logs show raw DeepSeek output
and parse result count for easy debugging.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:35:49 +02:00
7fc510f903 feat: two-phase pre-screening with HTTP check + DeepSeek batch classification
Phase 1 (no AI credits): httpx checks every selected domain concurrently
(30 parallel) with real browser UA — detects live/dead/parked/redirect.
Parked: keyword scan in body/title + known parking host redirect check.
Results saved to DB immediately; dead/parked never reach DeepSeek.

Phase 2 (single DeepSeek call): all live-site titles + snippets bundled
into ONE Replicate/DeepSeek-R1 request → returns niche + type for every
domain in batch (up to 80 per call, parallelised if more).

- app/prescreener.py (new): _check_one(), prescreen_domains(),
  classify_with_deepseek(), parking signal lists, same-domain redirect logic
- app/db.py: prescreen_status/niche/site_type/prescreen_at columns +
  migrations; save_prescreen_results() upsert helper
- app/main.py: POST /api/prescreen/batch endpoint
- app/static/index.html:
  - 🔍 Pre-screen button (disabled while running, shows spinner)
  - Niche + Type columns in Browse and Leads tables (.pni/.pty pills)
  - Prescreen status colour dot (●) when niche not yet set
  - prescreening state flag; result toast shows per-status counts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 21:22:45 +02:00