feat: deep site analysis engine + fix AI assess for any domain

site_analyzer.py (new):
- Fresh scrape with timing, page size, server, CMS detection
- Lorem ipsum detection (16 phrases incl. user's example)
- Placeholder content detection (hello world, sample page, etc.)
- Analytics: GA4, GTM, Facebook Pixel, Hotjar, Clarity
- Webmaster: Google Search Console, Bing, Yandex verification tags
- sitemap.xml and robots.txt check + Googlebot block detection
- Mobile viewport check, word count, image/script count
- Full contact extraction: emails, phones, WhatsApp, social links
- Kit Digital signal detection

AI worker fix:
- No longer requires pre-enrichment — works on ANY selected domain
- Does fresh site_analyzer scrape then calls Gemini with full context
- Stores site_analysis JSON alongside AI assessment
- Upserts into enriched_domains even if domain was never enriched

Gemini prompt now includes:
- Complete technical snapshot (load time, size, server, SSL)
- Full SEO signals (sitemap, robots, analytics, webmaster verified)
- Content quality (lorem ipsum matches, placeholder matches)
- Kit Digital signals
- All extracted contacts
- 500-word page text sample
- Outputs: summary, site_quality_score/10, content_issues[],
  urgency_signals[], performance_notes, seo_status,
  best_contact_channel+value, all_contacts, ES pitch,
  services_needed, outreach_notes

UI: rich AI modal with summary banner, quality grid, content issues,
    urgency signals, full contact list, technical snapshot

Fixes: correct Replicate token, ai_queue status='running' bug

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-13 17:46:01 +02:00
parent faca4b6e1a
commit 5ad8259c75
7 changed files with 530 additions and 111 deletions

View File

@@ -177,22 +177,16 @@ async def ai_status():
@app.post("/api/ai/assess/single")
async def ai_assess_single(body: dict):
"""Immediate (blocking) AI assessment of a single domain."""
"""Immediate (blocking) AI assessment — does fresh scrape, no pre-enrichment needed."""
domain = body.get("domain")
if not domain:
return JSONResponse({"error": "no domain"}, status_code=400)
from app.site_analyzer import analyze_site
from app.replicate_ai import assess_domain as gemini_assess
async with aiosqlite.connect(SQLITE_PATH) as db:
db.row_factory = aiosqlite.Row
async with db.execute(
"SELECT * FROM enriched_domains WHERE domain=?", (domain,)
) as cur:
row = await cur.fetchone()
if not row:
return JSONResponse({"error": "domain not yet enriched"}, status_code=404)
assessment = await gemini_assess(dict(row))
await save_ai_assessment(domain, assessment)
return assessment
analysis = await analyze_site(domain)
assessment = await gemini_assess(analysis)
await save_ai_assessment(domain, assessment, site_analysis=analysis)
return {**assessment, "site_analysis": analysis}
@app.get("/api/export")