Files
DomGod/docker-compose.yml
Malin faca4b6e1a feat: Gemini AI assessment, Kit Digital detection, contact extraction
Kit Digital detection (enricher.py):
- Scans img src/alt/srcset for digitalizadores, kit-digital, fondos-europeos etc
- Scans page text for Kit Digital, Agente Digitalizador, Next Generation EU, PRTR
- Scans links for acelerapyme.es, red.es, kit-digital refs
- +20 score bonus for Kit Digital confirmed sites (proven IT buyers)

Contact extraction (enricher.py):
- Pulls mailto/tel/wa.me links from HTML
- Extracts email addresses via regex, phone numbers (ES format)
- Detects social media links (FB, IG, LinkedIn, Twitter, TikTok)
- Stored as JSON in contact_info column

Gemini via Replicate (replicate_ai.py):
- Assesses lead quality (HOT/WARM/COLD), Kit Digital confirmation
- Identifies best contact channel + actual value (email/phone/WA)
- Writes Spanish cold-call/email pitch angle
- Lists services likely needed + outreach notes
- 3 concurrent requests, 90s timeout, JSON output parsing

DB: migration adds kit_digital, kit_digital_signals, contact_info,
    ai_assessment, ai_lead_quality, ai_pitch, ai_contact_channel/value,
    ai_queue table

UI: Kit Digital 🏅 badge, AI quality pill (clickable modal with full
    assessment), contact chips (email/phone/WA/social), AI Assess button,
    Kit Digital only filter, AI queue status in enrichment tab

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 17:25:06 +02:00

19 lines
525 B
YAML

version: "3.9"
services:
dashboard:
build: .
ports:
- "6677:6677"
volumes:
- ./data:/data
environment:
- DATA_DIR=/data
- PARQUET_URL=https://github.com/digitalcortex/72m-domains-dataset/raw/refs/heads/master/domains.parquet
- CONCURRENCY_LIMIT=50
- SCORE_THRESHOLD=60
- TARGET_TLDS=es,com,net
- TARGET_COUNTRIES=ES,GB,DE,FR,RO,PT,AD,IT
- REPLICATE_API_TOKEN=r8_6kV2NWMQyPVB9JILHJprrXJJh4vWazA22Osyj
- AI_CONCURRENCY=3
restart: unless-stopped