5.6 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
OpenDia is a browser automation tool that provides an open alternative to Dia, enabling AI models to interact with browsers through the Model Context Protocol (MCP). The project consists of two main components:
- Chrome Extension (
opendia-extension/) - Provides browser automation capabilities - MCP Server (
opendia-mcp/) - Bridges the extension to AI models via WebSocket
Architecture
The system uses a hybrid intelligence architecture:
- Pattern Database: Pre-built selectors for Twitter/X, GitHub, and common patterns (99% local operations)
- Semantic Analysis: Fallback using HTML semantics and ARIA labels when patterns fail
- WebSocket Bridge: Real-time communication between extension and MCP server on port 3000
Core Components
background.js:44-213- Defines 8 MCP tools for page analysis, content extraction, and element interactioncontent.js:4-50- Pattern database with confidence-scored selectors for known sitesserver.js:14-143- MCP protocol implementation with tool registration and WebSocket handling
Development Commands
MCP Server
cd opendia-mcp
npm install
npm start # Starts server on ws://localhost:3000
Chrome Extension
- Go to
chrome://extensions/ - Enable "Developer mode"
- Click "Load unpacked" and select
opendia-extensiondirectory - Extension will auto-connect to MCP server
Health Check
curl http://localhost:3001/health # Check server and extension status
MCP Tool Categories
Core Automation Tools (6 tools)
page_analyze- Two-phase intelligent analysis with element state detection- Phase 1 (
discover): Quick scan with element state (enabled/disabled, clickable) - Phase 2 (
detailed): Full analysis with element fingerprinting and interaction readiness - Enhanced pattern database with auth, content, search, nav, and form categories
- Phase 1 (
page_extract_content- Smart content extraction with summarization- Intelligent content detection for articles, search results, and social posts
- Token-efficient summaries with quality metrics and sample items
- Site-specific extraction patterns for Twitter/X, GitHub, Google
element_click- Click elements with auto-scroll and wait conditionselement_fill- Enhanced form filling with proper focus simulation- Natural focus sequence: click → focus → fill for modern web apps
- Comprehensive event simulation (beforeinput, input, change, composition)
- Validation of successful fill with actual value verification
page_navigate- Navigate with optional element wait conditionspage_wait_for- Wait for elements or text to appear
Element State Tools (1 tool)
element_get_state- Get detailed element state (disabled, clickable, focusable, empty)
Analytics Tools (2 tools)
get_analytics- Token usage analytics and performance metricsclear_analytics- Reset performance tracking data
Legacy Tools (2 tools)
browser_navigate- Basic navigation (compatibility)browser_execute_script- JavaScript execution with CSP fallbacks
Key Implementation Details
Phase 1 & 2 Token Efficiency Improvements
- Element Fingerprinting (
content.js:771-778): Compact representation usingtag.class@context.positionformat - Two-Phase Analysis (
content.js:203-323): Quick discovery vs detailed analysis with separate registries - Enhanced Pattern Database (
content.js:4-95): Intent-based categorization (auth, content, search, nav, form) - Viewport-Aware Analysis (
content.js:838-859): Intersection observer for visibility detection - Intelligent Element Scoring (
content.js:861-869): Confidence-based filtering and ranking
Phase 2 Content & Performance Optimization
- Smart Content Summarization (
content.js:662-717): Token-efficient summaries instead of full content - Site-Specific Extractors (
content.js:603-659): Pattern-based extraction for Twitter/X, GitHub, Google - Token Usage Tracking (
content.js:115-263): Performance metrics with localStorage persistence - Adaptive Optimization (
content.js:152-166): Auto-adjustment of limits based on success rates - Method Performance Tracking (
content.js:188-211): Success rate optimization per page type/intent
Focus & State Enhancement (Latest)
- Enhanced Focus Simulation (
content.js:1218-1254): Mouse events + focus + React state update - Element State Detection (
content.js:1311-1409): Comprehensive disabled/clickable/focusable analysis - Event Sequence Simulation (
content.js:1270-1299): beforeinput, input, change, composition events - Modern Web App Support: Handles React, Vue, Angular state management requirements
Core Architecture
- Element IDs generated dynamically with dual registries for quick/detailed phases
- Pattern matching prioritizes enhanced patterns → legacy patterns → semantic analysis
- WebSocket connection includes ping/pong heartbeat every 30 seconds
- Tool responses include execution time, confidence metrics, and token estimates
- CSP-aware JavaScript execution with multiple fallback strategies
Security Considerations
The extension requires broad permissions (<all_urls>, tabs, scripting) and establishes localhost WebSocket connections. This is intentional for automation capabilities but should only be used in trusted environments.
Testing
Use the extension popup to test connection status and tool availability. The MCP server provides real-time status via WebSocket connection state and tool registration logs.