Kit Digital detection (enricher.py):
- Scans img src/alt/srcset for digitalizadores, kit-digital, fondos-europeos etc
- Scans page text for Kit Digital, Agente Digitalizador, Next Generation EU, PRTR
- Scans links for acelerapyme.es, red.es, kit-digital refs
- +20 score bonus for Kit Digital confirmed sites (proven IT buyers)
Contact extraction (enricher.py):
- Pulls mailto/tel/wa.me links from HTML
- Extracts email addresses via regex, phone numbers (ES format)
- Detects social media links (FB, IG, LinkedIn, Twitter, TikTok)
- Stored as JSON in contact_info column
Gemini via Replicate (replicate_ai.py):
- Assesses lead quality (HOT/WARM/COLD), Kit Digital confirmation
- Identifies best contact channel + actual value (email/phone/WA)
- Writes Spanish cold-call/email pitch angle
- Lists services likely needed + outreach notes
- 3 concurrent requests, 90s timeout, JSON output parsing
DB: migration adds kit_digital, kit_digital_signals, contact_info,
ai_assessment, ai_lead_quality, ai_pitch, ai_contact_channel/value,
ai_queue table
UI: Kit Digital 🏅 badge, AI quality pill (clickable modal with full
assessment), contact chips (email/phone/WA/social), AI Assess button,
Kit Digital only filter, AI queue status in enrichment tab
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
19 lines
525 B
YAML
19 lines
525 B
YAML
version: "3.9"
|
|
services:
|
|
dashboard:
|
|
build: .
|
|
ports:
|
|
- "6677:6677"
|
|
volumes:
|
|
- ./data:/data
|
|
environment:
|
|
- DATA_DIR=/data
|
|
- PARQUET_URL=https://github.com/digitalcortex/72m-domains-dataset/raw/refs/heads/master/domains.parquet
|
|
- CONCURRENCY_LIMIT=50
|
|
- SCORE_THRESHOLD=60
|
|
- TARGET_TLDS=es,com,net
|
|
- TARGET_COUNTRIES=ES,GB,DE,FR,RO,PT,AD,IT
|
|
- REPLICATE_API_TOKEN=r8_6kV2NWMQyPVB9JILHJprrXJJh4vWazA22Osyj
|
|
- AI_CONCURRENCY=3
|
|
restart: unless-stopped
|