Go to file

Malin e426922544 feat: richer B2B assessment — legal page scraping, full contacts, summary

beauty_ai.py:
- Add _scrape_legal_pages(): fetches /aviso-legal, /politica-de-privacidad,
  /privacidad, /quienes-somos, /legal in parallel — Spanish aviso legal pages
  legally contain razón social, CIF/NIF, address and a contact email; legal
  snippet passed to AI so it can identify the registered company name
- Rewrite _build_beauty_prompt(): full technical profile (SSL, analytics, CMS,
  load time, word count, GDPR, mobile), all contact channels merged from both
  site_analyzer and legal pages, updated assessment rules with clearer HOT/WARM
  criteria, 700-char search results, richer portfolio portfolio context
- New JSON schema fields: summary (executive description), pitch_angle (one
  Spanish hook sentence), all_contacts dict (emails/phones/whatsapp/social
  full lists), best_contact_channel, best_contact_value, partnership_signals,
  revenue_estimate; outreach_email is now a complete ready-to-send email
- max_output_tokens raised from 2000 → 4000
- Contact merge: all_contacts populated from both site_analyzer and legal pages;
  top-level contact_* fields filled from merged data as fallback
- Run DDG search and legal page scraping in parallel (no extra wall-clock cost)

index.html (Pipeline):
- Business Summary panel with pitch_angle as accent subtitle
- Full all_contacts display: all emails (mailto links), all phones, all
  WhatsApp (green links), all social profiles (shortened display)
- partnership_signals chips alongside brand detection
- outreach_notes shown in amber at bottom of contact panel
- best_contact_channel chip in contact header
- Table contact column now shows best_contact_value if available

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-13 08:33:14 +02:00

app

feat: richer B2B assessment — legal page scraping, full contacts, summary

2026-05-13 08:33:14 +02:00

.env

feat: initial Dockerized domain intelligence dashboard

2026-04-13 16:22:30 +02:00

.gitignore

chore: add .gitignore and remove tracked __pycache__ files

2026-04-13 18:04:47 +02:00

docker-compose.yml

feat: BeautyLeads B2B cosmetics frontend on port 7788

2026-05-04 19:31:10 +02:00

Dockerfile

feat: BeautyLeads B2B cosmetics frontend on port 7788

2026-05-04 19:31:10 +02:00

README.md

feat: initial Dockerized domain intelligence dashboard

2026-04-13 16:22:30 +02:00

requirements.txt

feat: initial Dockerized domain intelligence dashboard

2026-04-13 16:22:30 +02:00

README.md

DomGod — Domain Intelligence Dashboard

Dockerized dashboard for filtering, enriching, scoring, and exporting leads from a 72M-domain dataset.

Quick start

docker compose up --build

Open http://localhost:6677

On first boot, the container downloads domains.parquet (~GB) and caches it in ./data/. Subsequent restarts skip the download.

Environment variables (docker-compose.yml)

Variable	Default	Description
`DATA_DIR`	`/data`	Where parquet + sqlite live
`PARQUET_URL`	GitHub raw URL	Source parquet
`CONCURRENCY_LIMIT`	`50`	Parallel enrichment workers
`SCORE_THRESHOLD`	`60`	"Hot lead" threshold
`TARGET_TLDS`	`es,com,net`	TLDs to prioritise
`TARGET_COUNTRIES`	`ES,GB,DE,FR,RO,PT,AD,IT`	Countries for scoring bonus

Scoring

Signal	Points
Domain is live	+20
SSL expiry < 30 days	+15
No valid SSL	+15
Known CMS detected	+15
No MX record	+10
IP in target country	+10
Shared hosting server	+10
Local business keywords in title	+5

Max score: 100. Hot ≥ 80, Warm 50–79, Cold < 50.

API

GET  /api/stats
GET  /api/domains?tld=es&page=1&limit=100&live_only=false
POST /api/enrich/batch      { "domains": ["example.com"] }
GET  /api/enrich/status
POST /api/enrich/pause
POST /api/enrich/resume
POST /api/enrich/retry
GET  /api/enriched?min_score=60&cms=wordpress&country=ES
GET  /api/export?tier=hot   (streams CSV)
POST /api/score/run

README.md Unescape Escape

DomGod — Domain Intelligence Dashboard

Quick start

Environment variables (docker-compose.yml)

Scoring

API

README.md