opendia/README.md

204 lines
7.5 KiB
Markdown
Raw Normal View History

2025-06-13 23:25:20 +02:00
# OpenDia ✳️
2025-06-11 16:29:16 +02:00
2025-06-13 23:24:52 +02:00
OpenDia is an open alternative to Dia. Connect to your browser with MCP & do anything.
> Exposes browser functions through the Model Context Protocol (MCP), allowing AI models to interact with browser capabilities.
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
## ⚠️ Security Warning
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
**IMPORTANT**: This extension is provided as-is with no security guarantees. By using this extension, you acknowledge and accept the following risks:
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
- The extension requires broad browser permissions to function
- It establishes WebSocket connections to localhost
- It allows external applications to control browser functions
- We cannot guarantee the security of data transmitted through the extension
- Use at your own risk and only in trusted environments
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
## Quick Start
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
### Prerequisites
- Node.js (v14 or higher)
- Google Chrome browser
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
### Installation
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
1. **Set up the MCP Server**
```bash
cd opendia-mcp
npm install
npm start
```
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
2. **Install the Chrome Extension**
- Open Chrome and go to `chrome://extensions/`
- Enable "Developer mode" in the top right
- Click "Load unpacked" and select the `opendia-extension` directory
- The extension icon will appear in your browser toolbar
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
3. **Configure your MCP client**
Add the browser server to your MCP configuration:
```json
{
"mcpServers": {
"opendia": {
"command": "node",
"args": ["/path/to/opendia/opendia-mcp/server.js"],
"env": {}
}
}
}
```
2025-06-11 16:29:16 +02:00
2025-06-27 17:57:12 +02:00
## Enhanced MCP Tools (17 Total)
2025-06-25 19:07:09 +02:00
2025-06-27 17:57:12 +02:00
### 🌐 Web Browser Automation Tools (8 Tools)
2025-06-25 19:07:09 +02:00
2025-06-27 17:57:12 +02:00
- **page_analyze**: Two-phase intelligent page analysis with anti-detection bypass
- Phase 1 (discover): Quick scan with element state detection
- Phase 2 (detailed): Full analysis with element fingerprinting
- Enhanced pattern database with confidence scoring
2025-06-25 19:07:09 +02:00
- Supports Twitter/X, GitHub, and universal patterns
2025-06-27 17:57:12 +02:00
- **page_extract_content**: Smart content extraction with summarization
2025-06-25 19:07:09 +02:00
- Extract articles, search results, or social media posts
2025-06-27 17:57:12 +02:00
- Token-efficient summaries with quality metrics
- Site-specific extraction patterns for Twitter/X, GitHub, Google
2025-06-25 19:07:09 +02:00
2025-06-27 17:57:12 +02:00
- **element_click**: Reliable element clicking with smart targeting
2025-06-25 19:07:09 +02:00
- Uses element IDs from page analysis
- Supports different click types (left, right, double)
- Auto-scrolls elements into view
2025-06-27 17:57:12 +02:00
- **element_fill**: Enhanced form filling with anti-detection bypass
- Specialized bypasses for Twitter/X, LinkedIn, Facebook
- Natural focus sequence: click → focus → fill
- Comprehensive event simulation for modern web apps
2025-06-25 19:07:09 +02:00
2025-06-27 17:57:12 +02:00
- **element_get_state**: Get detailed element state information
- Check if elements are disabled, clickable, visible
- Get current values and element properties
- Essential for conditional automation logic
- **page_navigate**: Enhanced navigation with wait conditions
- Navigate to URLs with optional element wait conditions
2025-06-25 19:07:09 +02:00
- Timeout handling and error reporting
2025-06-27 17:57:12 +02:00
- **page_wait_for**: Conditional waiting for elements or text
2025-06-25 19:07:09 +02:00
- Wait for elements to become visible
- Wait for specific text to appear on page
- Configurable timeout periods
2025-06-27 17:57:12 +02:00
- **page_scroll**: Scroll pages in various directions
- Critical for long pages and infinite scroll content
- Supports smooth scrolling and element targeting
2025-06-25 19:07:09 +02:00
2025-06-27 15:01:01 +02:00
### 📑 Tab Management Tools (4 Tools)
- **tab_create**: Create new tabs with advanced options
- Create tabs with or without URLs
- Control tab activation and focus
- Wait for elements to load after creation
- Perfect for multi-tab workflows
- **tab_close**: Close tabs with flexible targeting
- Close current tab, specific tab by ID, or multiple tabs
- Batch close operations for cleanup
- Safe handling of tab closure
- **tab_list**: Get comprehensive tab information
- List all open tabs with details (title, URL, status)
- Filter by current window or all windows
- Track active tab and tab states
- **tab_switch**: Switch between tabs intelligently
- Switch to specific tabs by ID
- Focus windows automatically
- Essential for multi-tab automation workflows
2025-06-27 17:57:12 +02:00
### 📊 Browser Data Access Tools (5 Tools)
2025-06-27 15:01:01 +02:00
2025-06-27 17:57:12 +02:00
- **get_bookmarks**: Get all bookmarks or search for specific ones
- Search bookmarks by query string
- Returns structured bookmark data with hierarchy
- **add_bookmark**: Add new bookmarks
- Create bookmarks with title and URL
- Optional parent folder support for organization
- **get_history**: Search browser history with comprehensive filters
- Advanced filtering by date, domains, visit count, keywords
- Sophisticated sorting and metadata extraction
- Perfect for finding previous work and research
- **get_selected_text**: Get currently selected text on the page
- Rich metadata about selection context and position
- Includes parent element information and page context
- Configurable length limits and truncation
- **get_page_links**: Get all hyperlinks on current page with filtering
- Smart filtering for internal/external links
- Domain-specific filtering options
- Essential for link analysis and navigation planning
2025-06-25 19:07:09 +02:00
## 🚀 Key Features
### Hybrid Intelligence Architecture
- **99% Local Operations**: Pattern database eliminates most LLM calls ($0 cost vs $20+/month)
- **Pattern Database**: Pre-built selectors for Twitter/X, GitHub, and common patterns
- **Semantic Analysis**: Fallback using HTML semantics and ARIA labels
- **Confidence Scoring**: Reliable element detection with quality metrics
### Visual Testing Interface
- **Real-time Testing**: Test content extraction and page analysis
- **Element Highlighting**: Visual feedback with confidence-based colors
- **Performance Metrics**: Execution time and data size monitoring
- **JSON Viewer**: Full result inspection and debugging
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
## Project Structure
```
opendia/
├── opendia-extension/ # Chrome extension files
│ ├── manifest.json # Extension configuration
│ ├── background.js # Background service worker
│ ├── content.js # Content scripts
│ ├── popup.html # Extension popup UI
│ └── popup.js # Popup functionality
├── opendia-mcp/ # MCP server implementation
│ ├── package.json # Server dependencies
│ ├── server.js # MCP server logic
│ └── .env # Environment configuration
└── README.md
```
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
## Contributing
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
1. **Adding New Browser Functions**
- Add the tool definition in `getAvailableTools()` in background.js
- Implement the handler in `handleMCPRequest()`
- The tool will automatically be registered with the MCP server
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
2. **Development Workflow**
- Modify extension files in the `opendia-extension` directory
- Reload the extension in Chrome to see changes
- Test new functionality through the MCP interface
2025-06-11 16:29:16 +02:00
2025-06-13 23:21:32 +02:00
3. **Security Considerations**
- Review and limit permissions based on needs
- The WebSocket server runs on localhost only by default
- Be cautious when executing scripts or accessing sensitive data
- Consider adding authentication between the extension and MCP server
2025-06-11 16:29:16 +02:00
## Troubleshooting
- **Extension not connecting**: Check that the MCP server is running on port 3000
- **Tools not available**: Verify the extension is loaded and check the popup for connection status
2025-06-13 23:21:32 +02:00
- **Permission errors**: Ensure the extension has the necessary permissions in manifest.json
## License
MIT License
## Disclaimer
This software is provided "as is", without warranty of any kind, express or implied. In no event shall the authors or copyright holders be liable for any claim, damages or other liability arising from the use of this software.