Malin/opendia

Fork 0

mirror of https://github.com/aaronjmars/opendia.git synced 2025-12-29 16:16:00 +00:00

Aaron Elijah Mars 3006bceac4 improved design

2025-06-28 20:26:46 +02:00

6.0 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

OpenDia is a browser automation tool that provides an open alternative to Dia, enabling AI models to interact with browsers through the Model Context Protocol (MCP). The project consists of two main components:

Chrome Extension (opendia-extension/) - Provides browser automation capabilities
MCP Server (opendia-mcp/) - Bridges the extension to AI models via WebSocket

Architecture

The system uses a hybrid intelligence architecture:

Pattern Database: Pre-built selectors for Twitter/X, GitHub, and common patterns (99% local operations)
Semantic Analysis: Fallback using HTML semantics and ARIA labels when patterns fail
WebSocket Bridge: Real-time communication between extension and MCP server on port 3000

Core Components

background.js:44-213 - Defines 17 MCP tools for page analysis, content extraction, element interaction, tab management, and data access
content.js:4-95 - Enhanced pattern database with confidence-scored selectors for known sites
server.js:14-143 - MCP protocol implementation with tool registration and WebSocket handling

Development Commands

MCP Server

cd opendia-mcp
npm install
npm start  # Starts server on ws://localhost:3000

Chrome Extension

Go to chrome://extensions/
Enable "Developer mode"
Click "Load unpacked" and select opendia-extension directory
Extension will auto-connect to MCP server

Health Check

curl http://localhost:3001/health  # Check server and extension status

MCP Tool Categories

Web Browser Automation Tools (8 tools)

page_analyze - Two-phase intelligent analysis with element state detection
- Phase 1 (discover): Quick scan with element state (enabled/disabled, clickable)
- Phase 2 (detailed): Full analysis with element fingerprinting and interaction readiness
- Enhanced pattern database with auth, content, search, nav, and form categories
page_extract_content - Smart content extraction with summarization
- Intelligent content detection for articles, search results, and social posts
- Token-efficient summaries with quality metrics and sample items
- Site-specific extraction patterns for Twitter/X, GitHub, Google
element_click - Click elements with auto-scroll and wait conditions
element_fill - Enhanced form filling with proper focus simulation
- Natural focus sequence: click → focus → fill for modern web apps
- Comprehensive event simulation (beforeinput, input, change, composition)
- Validation of successful fill with actual value verification
element_get_state - Get detailed element state (disabled, clickable, focusable, empty)
page_navigate - Navigate with optional element wait conditions
page_wait_for - Wait for elements or text to appear
page_scroll - Scroll pages in various directions

Tab Management Tools (4 tools)

tab_create - Create new tabs with advanced options
tab_close - Close tabs with flexible targeting
tab_list - Get comprehensive tab information
tab_switch - Switch between tabs intelligently

Browser Data Access Tools (5 tools)

get_bookmarks - Get all bookmarks or search for specific ones
add_bookmark - Add new bookmarks with folder support
get_history - Search browser history with comprehensive filters
get_selected_text - Get currently selected text with rich metadata
get_page_links - Get all hyperlinks with filtering options

Key Implementation Details

Phase 1 & 2 Token Efficiency Improvements

Element Fingerprinting (content.js:771-778): Compact representation using tag.class@context.position format
Two-Phase Analysis (content.js:203-323): Quick discovery vs detailed analysis with separate registries
Enhanced Pattern Database (content.js:4-95): Intent-based categorization (auth, content, search, nav, form)
Viewport-Aware Analysis (content.js:838-859): Intersection observer for visibility detection
Intelligent Element Scoring (content.js:861-869): Confidence-based filtering and ranking

Phase 2 Content & Performance Optimization

Smart Content Summarization (content.js:662-717): Token-efficient summaries instead of full content
Site-Specific Extractors (content.js:603-659): Pattern-based extraction for Twitter/X, GitHub, Google
Token Usage Tracking (content.js:115-263): Performance metrics with localStorage persistence
Adaptive Optimization (content.js:152-166): Auto-adjustment of limits based on success rates
Method Performance Tracking (content.js:188-211): Success rate optimization per page type/intent

Focus & State Enhancement (Latest)

Enhanced Focus Simulation (content.js:1218-1254): Mouse events + focus + React state update
Element State Detection (content.js:1311-1409): Comprehensive disabled/clickable/focusable analysis
Event Sequence Simulation (content.js:1270-1299): beforeinput, input, change, composition events
Modern Web App Support: Handles React, Vue, Angular state management requirements

Core Architecture

Element IDs generated dynamically with dual registries for quick/detailed phases
Pattern matching prioritizes enhanced patterns → legacy patterns → semantic analysis
WebSocket connection includes ping/pong heartbeat every 30 seconds
Tool responses include execution time, confidence metrics, and token estimates
CSP-aware JavaScript execution with multiple fallback strategies

Security Considerations

The extension requires broad permissions (<all_urls>, tabs, scripting) and establishes localhost WebSocket connections. This is intentional for automation capabilities but should only be used in trusted environments.

Testing

Use the extension popup to test connection status and tool availability. The MCP server provides real-time status via WebSocket connection state and tool registration logs.

6.0 KiB Raw Blame History