- loadDomains(): add generation counter so stale auto-advance fetches
cannot overwrite a newer user-triggered search result; snapshot filter
state before the first await so URL reflects what was requested; add
HTTP status check so backend errors surface as toasts rather than
silent empty results; auto-advance now calls loadDomains() without
await so the counter increments correctly per page advance
- beauty_ai: word-boundary regex for short brands (≤5 chars) to stop
'ref' matching 'reference'/'refresh'/'prefer' etc.; merge phones,
whatsapp and social_links from site_analyzer directly into result
(more reliable than AI extraction); add contact_whatsapp and
contact_social fields to AI JSON schema
- db: add requeue_beauty() for re-assessing already-assessed domains
- beauty_main: /api/beauty/reassess/batch endpoint using requeue_beauty
- index.html: Re-assess Selected bulk button, per-row ↺ button in
Browse and Pipeline, WhatsApp + social links in Pipeline contact panel
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Assessed/Not assessed filter:
- 'yes' → beauty_lead_quality IS NOT NULL (has been B2B assessed)
- 'no' → beauty_lead_quality IS NULL (never assessed)
- wired through /api/enriched → get_enriched(beauty_assessed=)
Per-page limit:
- options: 100 / 500 / 1000 / 2000 / 5000
- backend cap raised from le=1000 to le=5000
Auto-advance on empty Not-checked page:
- after bulk validate/prescreen, loadDomains reloads the same DuckDB page
- if every domain on that page is now processed (client-side filter → 0 rows)
but the page still returned results, automatically increment page and retry
- prevents "No domains found" after successfully processing a batch
- capped at page 500 to avoid infinite loop
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: loadDomains() always hit /api/domains (DuckDB 72M rows) and filtered
niche/site_type/prescreen_status client-side on a random page of 100 domains —
virtually none had been classified, so Live+Beauty+Ecommerce always returned 0.
- loadDomains() now routes to /api/enriched when any enrichment filter is active
(prescreen_status, niche, site_type, country) — all filters are server-side SQLite
- Falls back to /api/domains only when no enrichment filters are set (discovery mode)
- alpha_only and no_sld supported in both modes:
- DuckDB: existing regex support
- SQLite: LIKE patterns (no hyphens/digits) + dot-count (no SLD)
- Add alpha_only/no_sld params to /api/enriched endpoint and get_enriched()
- Fix stale d.classified reference in prescreenOne toast
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- add page_snippet TEXT column migration
- save prescreener body snippet (600 chars) to page_snippet on upsert
- keyword filter now searches: domain, page_title, page_snippet, beauty_assessment JSON
so "belleza" matches sites whose content/assessment mentions the word even if
the domain name or title doesn't
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- add keyword and tld params to get_enriched() in db.py (LIKE on domain + page_title)
- forward keyword/tld through /api/enriched in beauty_main.py
- rewrite beauty/index.html loadDomains() to pass all filters server-side via URLSearchParams
- track domainsTotal from API response for correct pagination display
- add Pre-screen Selected and B2B Assess Selected bulk action buttons
- add per-row Screen and Assess buttons
- goSearch() resets to page 1 before fetching
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SQLite locking:
- Enable WAL journal mode in init_db (readers don't block writers)
- Set busy_timeout=30000ms in init_db
- Add timeout=30 to every aiosqlite.connect() across db.py, validator.py,
enricher.py, main.py so connections wait up to 30s instead of crashing
Error status:
- 4xx/5xx HTTP responses are now prescreen_status='error' (server alive
but broken/blocking) instead of 'live'
- Added 'error' counter to validator stats and orange Error stat box in UI
- Added ps-error CSS class (orange) and filter option in Browse tab
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs:
1. _parse_classify_output stripped <think> block before searching for JSON.
DeepSeek-R1 often puts the JSON array inside the think block (especially
when it "decides" mid-reasoning), so stripping it first destroyed the data.
Fix: search full output first, then inside <think>, then stripped — three
fallback strategies with info logging at each step.
2. Phase 2 save used bare UPDATE WHERE domain=? which silently does nothing
if the domain row doesn't exist yet in enriched_domains.
Fix: replace with INSERT ... ON CONFLICT DO UPDATE (true upsert).
Also adds logger.info lines so container logs show raw DeepSeek output
and parse result count for easy debugging.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 1 (no AI credits): httpx checks every selected domain concurrently
(30 parallel) with real browser UA — detects live/dead/parked/redirect.
Parked: keyword scan in body/title + known parking host redirect check.
Results saved to DB immediately; dead/parked never reach DeepSeek.
Phase 2 (single DeepSeek call): all live-site titles + snippets bundled
into ONE Replicate/DeepSeek-R1 request → returns niche + type for every
domain in batch (up to 80 per call, parallelised if more).
- app/prescreener.py (new): _check_one(), prescreen_domains(),
classify_with_deepseek(), parking signal lists, same-domain redirect logic
- app/db.py: prescreen_status/niche/site_type/prescreen_at columns +
migrations; save_prescreen_results() upsert helper
- app/main.py: POST /api/prescreen/batch endpoint
- app/static/index.html:
- 🔍 Pre-screen button (disabled while running, shows spinner)
- Niche + Type columns in Browse and Leads tables (.pni/.pty pills)
- Prescreen status colour dot (●) when niche not yet set
- prescreening state flag; result toast shows per-status counts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- db.py: add `language` column to ai_queue; migration; queue_ai() accepts
language param and re-queues with ON CONFLICT UPDATE so changing language works
- main.py: batch and single assess endpoints accept `language` from request body
- enricher.py: ai_worker_loop reads language column, passes to _assess_one()
- replicate_ai.py: assess_domain() and _build_prompt() accept language param;
OUTPUT LANGUAGE section injected into prompt so Gemini writes pitch/email in
the requested language (EN/ES/RO)
- index.html: flag dropdown (🇪🇸/🇬🇧/🇷🇴) next to AI Assess button; aiLang
state default ES; language sent in all batch assessment requests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. scorer: dead sites capped at 5 (was scoring HOT from SSL/CMS signals)
2. Kit Digital: require explicit kit-digital/agente-digitalizador signals;
generic EU logo patterns (fondos-europeos, logo-ue, cofinanciado) removed.
Gemini kit_digital_confirmed now overwrites heuristic in DB.
3. Browse table: social links replaced with compact coloured icon badges
(fb/ig/in/x/tt/yt) linked to the profile URLs
4. site_analyzer: added has_gmb / gmb_url detection (Maps embed, Place links,
LocalBusiness schema); fed to Gemini prompt
5. scorer: +5 no-social, +3 reachable contact; Gemini prompt includes GMB and
social media management as sellable services; modal shows GMB/social status
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Build /data/domains.duckdb on first run (tld+parts columns + ART index)
→ TLD filter goes from ~60s full scan to <100ms index lookup
→ System still works (slower) while index builds in background
- New /api/domains params: alpha_only, no_sld, keyword
→ alpha_only: domains with only letters (no hyphens/numbers)
→ no_sld: parts=2, excludes com.es / net.es patterns
→ keyword: LIKE '%term%' niche search
- /api/domains and /api/enriched now return total count for pagination
- Pagination: shows total matches, page X of Y, Next disabled at last page
- Enrich button: toast notifications instead of alert(), error handling
- Select all on page button, clear selection button
- Stats/TLD breakdown cached after first load (no repeat full scan)
- Header shows index build status (building → ready)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FastAPI backend with DuckDB pushdown queries on 72M parquet
- Async enrichment worker: HTTP, SSL, DNS MX, CMS fingerprint, ip-api.com
- Resumable parquet download with HTTP Range support
- Lead scoring engine (max 100 pts, target countries ES,GB,DE,FR,RO,PT,AD,IT)
- Single-file Alpine.js + Chart.js dashboard on port 6677
- SQLite enrichment DB with job queue and scores tables
- Dockerized with persistent /data volume
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>