fix: DeepSeek niche/type not saving to DB

Two bugs:
1. _parse_classify_output stripped <think> block before searching for JSON.
   DeepSeek-R1 often puts the JSON array inside the think block (especially
   when it "decides" mid-reasoning), so stripping it first destroyed the data.
   Fix: search full output first, then inside <think>, then stripped — three
   fallback strategies with info logging at each step.

2. Phase 2 save used bare UPDATE WHERE domain=? which silently does nothing
   if the domain row doesn't exist yet in enriched_domains.
   Fix: replace with INSERT ... ON CONFLICT DO UPDATE (true upsert).

Also adds logger.info lines so container logs show raw DeepSeek output
and parse result count for easy debugging.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-17 21:35:49 +02:00
parent 7fc510f903
commit a0c9db1ef2
2 changed files with 49 additions and 10 deletions

View File

@@ -436,10 +436,14 @@ async def save_prescreen_results(results: list[dict]):
niche = r.get("niche")
site_type = r.get("type") # DeepSeek returns "type" key
if niche or site_type:
# Classification-only update (domain row must already exist)
# Upsert niche/type — works even if the row was never enriched
await db.execute(
"UPDATE enriched_domains SET niche=?, site_type=? WHERE domain=?",
(niche, site_type, domain),
"""INSERT INTO enriched_domains (domain, niche, site_type)
VALUES (?, ?, ?)
ON CONFLICT(domain) DO UPDATE SET
niche=excluded.niche,
site_type=excluded.site_type""",
(domain, niche, site_type),
)
else:
# Prescreen status upsert — create row if it doesn't exist yet