e4269225448b21e6a93b3a068406f50953ec62c2
beauty_ai.py: - Add _scrape_legal_pages(): fetches /aviso-legal, /politica-de-privacidad, /privacidad, /quienes-somos, /legal in parallel — Spanish aviso legal pages legally contain razón social, CIF/NIF, address and a contact email; legal snippet passed to AI so it can identify the registered company name - Rewrite _build_beauty_prompt(): full technical profile (SSL, analytics, CMS, load time, word count, GDPR, mobile), all contact channels merged from both site_analyzer and legal pages, updated assessment rules with clearer HOT/WARM criteria, 700-char search results, richer portfolio portfolio context - New JSON schema fields: summary (executive description), pitch_angle (one Spanish hook sentence), all_contacts dict (emails/phones/whatsapp/social full lists), best_contact_channel, best_contact_value, partnership_signals, revenue_estimate; outreach_email is now a complete ready-to-send email - max_output_tokens raised from 2000 → 4000 - Contact merge: all_contacts populated from both site_analyzer and legal pages; top-level contact_* fields filled from merged data as fallback - Run DDG search and legal page scraping in parallel (no extra wall-clock cost) index.html (Pipeline): - Business Summary panel with pitch_angle as accent subtitle - Full all_contacts display: all emails (mailto links), all phones, all WhatsApp (green links), all social profiles (shortened display) - partnership_signals chips alongside brand detection - outreach_notes shown in amber at bottom of contact panel - best_contact_channel chip in contact header - Table contact column now shows best_contact_value if available Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DomGod — Domain Intelligence Dashboard
Dockerized dashboard for filtering, enriching, scoring, and exporting leads from a 72M-domain dataset.
Quick start
docker compose up --build
On first boot, the container downloads domains.parquet (~GB) and caches it in ./data/. Subsequent restarts skip the download.
Environment variables (docker-compose.yml)
| Variable | Default | Description |
|---|---|---|
DATA_DIR |
/data |
Where parquet + sqlite live |
PARQUET_URL |
GitHub raw URL | Source parquet |
CONCURRENCY_LIMIT |
50 |
Parallel enrichment workers |
SCORE_THRESHOLD |
60 |
"Hot lead" threshold |
TARGET_TLDS |
es,com,net |
TLDs to prioritise |
TARGET_COUNTRIES |
ES,GB,DE,FR,RO,PT,AD,IT |
Countries for scoring bonus |
Scoring
| Signal | Points |
|---|---|
| Domain is live | +20 |
| SSL expiry < 30 days | +15 |
| No valid SSL | +15 |
| Known CMS detected | +15 |
| No MX record | +10 |
| IP in target country | +10 |
| Shared hosting server | +10 |
| Local business keywords in title | +5 |
Max score: 100. Hot ≥ 80, Warm 50–79, Cold < 50.
API
GET /api/stats
GET /api/domains?tld=es&page=1&limit=100&live_only=false
POST /api/enrich/batch { "domains": ["example.com"] }
GET /api/enrich/status
POST /api/enrich/pause
POST /api/enrich/resume
POST /api/enrich/retry
GET /api/enriched?min_score=60&cms=wordpress&country=ES
GET /api/export?tier=hot (streams CSV)
POST /api/score/run
Description
Languages
Python
59.8%
HTML
40.1%