2025-06-24 21:24:14 +02:00
|
|
|
|
# /troubleshoot - Debug and resolve issues systematically
|
2025-06-22 14:02:49 +02:00
|
|
|
|
|
2025-06-24 21:24:14 +02:00
|
|
|
|
## Legend
|
|
|
|
|
|
| Symbol | Meaning | | Abbrev | Meaning |
|
|
|
|
|
|
|--------|---------|---|--------|---------|
|
|
|
|
|
|
| → | leads to | | cfg | configuration |
|
|
|
|
|
|
| & | and/with | | impl | implementation |
|
|
|
|
|
|
| w/ | with | | perf | performance |
|
|
|
|
|
|
| @ | at/located | | ops | operations |
|
|
|
|
|
|
| > | greater than | | val | validation |
|
|
|
|
|
|
| ∀ | for all/every | | req | requirements |
|
|
|
|
|
|
| ∃ | exists/there is | | deps | dependencies |
|
|
|
|
|
|
| ∴ | therefore | | env | environment |
|
|
|
|
|
|
| ∵ | because | | db | database |
|
|
|
|
|
|
| ≡ | equivalent | | api | interface |
|
|
|
|
|
|
| ≈ | approximately | | docs | documentation |
|
|
|
|
|
|
| 📁 | directory/path | | std | standard |
|
|
|
|
|
|
| 🔢 | number/count | | def | default |
|
|
|
|
|
|
| 📝 | text/string | | ctx | context |
|
|
|
|
|
|
| ⚙ | setting/config | | err | error |
|
|
|
|
|
|
| 🎛 | control/flags | | exec | execution |
|
|
|
|
|
|
| 🔧 | configuration | | qual | quality |
|
|
|
|
|
|
| 📋 | group/category | | rec | recovery |
|
|
|
|
|
|
| 🚨 | critical/urgent | | sev | severity |
|
|
|
|
|
|
| ⚠ | warning/caution | | resp | response |
|
|
|
|
|
|
| 🔄 | retry/recovery | | esc | escalation |
|
|
|
|
|
|
| ✅ | success/fixed | | tok | token |
|
|
|
|
|
|
| ❌ | failure/error | | opt | optimization |
|
|
|
|
|
|
| ℹ | information | | UX | user experience |
|
|
|
|
|
|
| ⚡ | fast/quick | | UI | user interface |
|
|
|
|
|
|
| 🐌 | slow/delayed | | C | critical |
|
|
|
|
|
|
| ✨ | complete/done | | H | high |
|
|
|
|
|
|
| 📖 | read operation | | M | medium |
|
|
|
|
|
|
| ✏ | edit operation | | L | low |
|
|
|
|
|
|
| 🗑 | delete operation | | |
|
2025-06-22 14:02:49 +02:00
|
|
|
|
|
2025-06-24 21:24:14 +02:00
|
|
|
|
## Purpose
|
|
|
|
|
|
Debug and resolve issues in code or systems specified in $ARGUMENTS using systematic troubleshooting methodologies and analysis techniques.
|
2025-06-22 14:02:49 +02:00
|
|
|
|
|
2025-06-24 21:24:14 +02:00
|
|
|
|
## Syntax
|
|
|
|
|
|
`/troubleshoot [flags] [issue-description]`
|
2025-06-22 14:02:49 +02:00
|
|
|
|
|
2025-06-24 21:24:14 +02:00
|
|
|
|
## Universal Flags
|
|
|
|
|
|
--plan: "Show execution plan before running"
|
|
|
|
|
|
--uc: "UltraCompressed mode (~70% token reduction)"
|
|
|
|
|
|
--ultracompressed: "Alias for --uc"
|
|
|
|
|
|
--think: "Multi-file analysis w/ context (4K tokens)"
|
|
|
|
|
|
--think-hard: "Deep architectural analysis (10K tokens)"
|
|
|
|
|
|
--ultrathink: "Critical system redesign (32K tokens)"
|
|
|
|
|
|
--c7: "Enable Context7→library documentation lookup"
|
|
|
|
|
|
--seq: "Enable Sequential→complex analysis & thinking"
|
|
|
|
|
|
--magic: "Enable Magic→UI component generation"
|
|
|
|
|
|
--pup: "Enable Puppeteer→browser automation & testing"
|
|
|
|
|
|
--all-mcp: "Enable all MCP servers"
|
|
|
|
|
|
--no-mcp: "Disable all MCP servers (native tools only)"
|
|
|
|
|
|
--no-c7: "Disable Context7 specifically"
|
|
|
|
|
|
--no-seq: "Disable Sequential thinking specifically"
|
|
|
|
|
|
--no-magic: "Disable Magic UI builder specifically"
|
|
|
|
|
|
--no-pup: "Disable Puppeteer specifically"
|
2025-06-22 14:02:49 +02:00
|
|
|
|
|
2025-06-24 21:24:14 +02:00
|
|
|
|
## Command-Specific Flags
|
2025-06-22 14:02:49 +02:00
|
|
|
|
|
2025-06-24 21:24:14 +02:00
|
|
|
|
**Troubleshooting Modes:**
|
|
|
|
|
|
- `--investigate`: Focus on understanding and analyzing issues without immediate fixes
|
|
|
|
|
|
- `--fix`: Complete bug-fixing workflow with testing and verification (default)
|
|
|
|
|
|
- `--analyze`: Deep technical analysis of complex system interactions
|
|
|
|
|
|
- `--diagnose`: Systematic diagnostic approach with structured methodology
|
2025-06-22 14:02:49 +02:00
|
|
|
|
|
2025-06-24 21:24:14 +02:00
|
|
|
|
**Analysis Methods:**
|
|
|
|
|
|
- `--five-whys`: Apply root cause analysis methodology iteratively
|
|
|
|
|
|
- `--binary-search`: Use binary search approach to isolate problem scope
|
|
|
|
|
|
- `--timeline`: Analyze issue timeline and recent changes
|
|
|
|
|
|
- `--dependencies`: Focus on dependency and integration issues
|
2025-06-22 14:02:49 +02:00
|
|
|
|
|
2025-06-24 21:24:14 +02:00
|
|
|
|
**Environment Focus:**
|
|
|
|
|
|
- `--prod`: Production-specific issue handling with minimal disruption
|
|
|
|
|
|
- `--staging`: Staging environment debugging and testing
|
|
|
|
|
|
- `--local`: Local development environment troubleshooting
|
|
|
|
|
|
- `--cross-env`: Cross-environment consistency analysis
|
|
|
|
|
|
|
|
|
|
|
|
**Investigation Tools:**
|
|
|
|
|
|
- `--logs`: Focus on log analysis and pattern detection
|
|
|
|
|
|
- `--performance`: Performance profiling and bottleneck analysis
|
|
|
|
|
|
- `--security`: Security-focused investigation and vulnerability analysis
|
|
|
|
|
|
- `--network`: Network connectivity and API integration debugging
|
|
|
|
|
|
|
|
|
|
|
|
## Examples
|
|
|
|
|
|
- `/troubleshoot --investigate --logs --think` → Log analysis with context
|
|
|
|
|
|
- `/troubleshoot --five-whys --prod --think-hard` → Production root cause analysis
|
|
|
|
|
|
- `/troubleshoot --fix --performance --ultrathink` → Performance issue resolution
|
|
|
|
|
|
- `/troubleshoot --binary-search --dependencies` → Systematic dependency debugging
|
|
|
|
|
|
- `/troubleshoot --analyze --security --network` → Security and network analysis
|
|
|
|
|
|
|
|
|
|
|
|
## Troubleshooting Workflow
|
|
|
|
|
|
|
|
|
|
|
|
**1. Issue Reproduction and Understanding**
|
|
|
|
|
|
- **Minimal Reproduction**: Create smallest possible reproduction case
|
|
|
|
|
|
- **Behavior Documentation**: Document expected vs actual behavior clearly
|
|
|
|
|
|
- **Impact Assessment**: Identify affected components, users, and business impact
|
|
|
|
|
|
- **Severity Classification**: Determine urgency and priority level
|
|
|
|
|
|
- **Environment Analysis**: Understand where and when the issue occurs
|
|
|
|
|
|
|
|
|
|
|
|
**2. Investigation and Isolation**
|
|
|
|
|
|
- **Tool Utilization**: Apply debugging tools and strategic logging
|
|
|
|
|
|
- **Scope Narrowing**: Use binary search to isolate problem area
|
|
|
|
|
|
- **Change Analysis**: Review recent changes using git history and blame
|
|
|
|
|
|
- **Data Collection**: Analyze logs, stack traces, and monitoring data
|
|
|
|
|
|
- **Factor Elimination**: Rule out environmental and configuration factors
|
|
|
|
|
|
|
|
|
|
|
|
**3. Root Cause Analysis**
|
|
|
|
|
|
- **Underlying Causes**: Look beyond symptoms to find root causes
|
|
|
|
|
|
- **Five-Whys Method**: Apply iterative questioning technique
|
|
|
|
|
|
- **Systemic Analysis**: Consider broader system and process issues
|
|
|
|
|
|
- **Contributing Factors**: Document all factors that led to the issue
|
|
|
|
|
|
- **Pattern Recognition**: Identify similar issues and common causes
|
|
|
|
|
|
|
|
|
|
|
|
**4. Solution Development (--fix mode)**
|
|
|
|
|
|
- **Test Creation**: Write failing test that reproduces the issue
|
|
|
|
|
|
- **Minimal Fix**: Implement focused solution addressing root cause
|
|
|
|
|
|
- **Compatibility**: Ensure backward compatibility and minimal disruption
|
|
|
|
|
|
- **Edge Cases**: Consider side effects and edge case scenarios
|
|
|
|
|
|
- **Code Review**: Apply standard code review and quality practices
|
|
|
|
|
|
|
|
|
|
|
|
**5. Verification and Prevention**
|
|
|
|
|
|
- **Fix Validation**: Verify solution resolves issue completely
|
|
|
|
|
|
- **Regression Testing**: Run full test suite to prevent regressions
|
|
|
|
|
|
- **Realistic Testing**: Test in production-like conditions
|
|
|
|
|
|
- **Monitoring**: Add monitoring and alerting for early detection
|
|
|
|
|
|
- **Documentation**: Record lessons learned and prevention measures
|
|
|
|
|
|
|
|
|
|
|
|
## Investigation Techniques
|
|
|
|
|
|
|
|
|
|
|
|
**Debugging Approaches:**
|
|
|
|
|
|
- **Strategic Logging**: Add targeted logging at key decision points
|
|
|
|
|
|
- **Breakpoint Analysis**: Use debugger breakpoints for step-through analysis
|
|
|
|
|
|
- **State Inspection**: Examine variable states and data structures
|
|
|
|
|
|
- **Call Stack Analysis**: Trace execution paths and function calls
|
|
|
|
|
|
- **Memory Debugging**: Analyze memory usage and potential leaks
|
|
|
|
|
|
|
|
|
|
|
|
**Performance Analysis:**
|
|
|
|
|
|
- **Profiling Tools**: Use performance profilers for bottleneck identification
|
|
|
|
|
|
- **Resource Monitoring**: Track CPU, memory, and I/O usage patterns
|
|
|
|
|
|
- **Query Analysis**: Analyze database queries and execution plans
|
|
|
|
|
|
- **Network Inspection**: Monitor network requests and response times
|
|
|
|
|
|
- **Caching Evaluation**: Assess caching effectiveness and hit rates
|
|
|
|
|
|
|
|
|
|
|
|
**System Analysis:**
|
|
|
|
|
|
- **Configuration Review**: Examine system and application configurations
|
|
|
|
|
|
- **Dependency Mapping**: Map and analyze component dependencies
|
|
|
|
|
|
- **Integration Testing**: Test inter-service communication and APIs
|
|
|
|
|
|
- **Infrastructure Analysis**: Review infrastructure and deployment setup
|
|
|
|
|
|
- **Security Assessment**: Analyze security configurations and access controls
|
|
|
|
|
|
|
|
|
|
|
|
## Five-Whys Analysis (--five-whys)
|
|
|
|
|
|
|
|
|
|
|
|
**Methodology:**
|
|
|
|
|
|
1. **Problem Statement**: Clearly define the observed problem
|
|
|
|
|
|
2. **First Why**: Why did this problem occur? (immediate cause)
|
|
|
|
|
|
3. **Second Why**: Why did that cause occur? (deeper cause)
|
|
|
|
|
|
4. **Third Why**: Why did that deeper cause occur? (root cause)
|
|
|
|
|
|
5. **Fourth Why**: Why does that root cause exist? (systemic cause)
|
|
|
|
|
|
6. **Fifth Why**: Why is that system in place? (organizational cause)
|
|
|
|
|
|
|
|
|
|
|
|
**Documentation:**
|
|
|
|
|
|
- Record each level of analysis with evidence
|
|
|
|
|
|
- Document contributing factors at each level
|
|
|
|
|
|
- Identify prevention measures for each cause
|
2025-06-22 14:02:49 +02:00
|
|
|
|
- Propose systemic improvements to prevent recurrence
|
2025-06-24 21:24:14 +02:00
|
|
|
|
- Create action items for short-term and long-term fixes
|
|
|
|
|
|
|
|
|
|
|
|
## Production Issue Handling (--prod)
|
|
|
|
|
|
|
|
|
|
|
|
**Production-Specific Considerations:**
|
|
|
|
|
|
- **Minimal Disruption**: Prioritize system stability and user experience
|
|
|
|
|
|
- **Rollback Readiness**: Prepare immediate rollback options
|
|
|
|
|
|
- **Monitoring Integration**: Use existing monitoring and alerting systems
|
|
|
|
|
|
- **Communication**: Maintain stakeholder communication throughout
|
|
|
|
|
|
- **Documentation**: Record all changes and decisions for audit trail
|
|
|
|
|
|
|
|
|
|
|
|
**Production Analysis:**
|
|
|
|
|
|
- **Deployment Correlation**: Correlate issues with recent deployments
|
|
|
|
|
|
- **Traffic Patterns**: Analyze user traffic and usage patterns
|
|
|
|
|
|
- **Configuration Changes**: Review recent configuration modifications
|
|
|
|
|
|
- **Resource Utilization**: Monitor system resource usage and limits
|
|
|
|
|
|
- **Service Dependencies**: Check health of dependent services
|
|
|
|
|
|
|
|
|
|
|
|
**Safe Production Practices:**
|
|
|
|
|
|
- **Feature Flags**: Use feature toggles to isolate problematic features
|
|
|
|
|
|
- **Gradual Rollout**: Implement fixes gradually with monitoring
|
|
|
|
|
|
- **A/B Testing**: Compare fix effectiveness with control groups
|
|
|
|
|
|
- **Circuit Breakers**: Implement circuit breakers for failing services
|
|
|
|
|
|
- **Health Checks**: Continuous health monitoring during fixes
|
|
|
|
|
|
|
|
|
|
|
|
## Investigation Tools and Techniques
|
|
|
|
|
|
|
|
|
|
|
|
**Logging and Monitoring:**
|
|
|
|
|
|
- Centralized log aggregation and analysis
|
|
|
|
|
|
- Real-time monitoring dashboards and alerts
|
|
|
|
|
|
- Distributed tracing for microservices
|
|
|
|
|
|
- Application performance monitoring (APM)
|
|
|
|
|
|
- Custom metrics and business intelligence
|
|
|
|
|
|
|
|
|
|
|
|
**Debugging Tools:**
|
|
|
|
|
|
- Interactive debuggers and IDE integration
|
|
|
|
|
|
- Remote debugging capabilities
|
|
|
|
|
|
- Memory profilers and leak detectors
|
|
|
|
|
|
- Performance profiling tools
|
|
|
|
|
|
- Network traffic analyzers
|
|
|
|
|
|
|
|
|
|
|
|
**Testing and Validation:**
|
|
|
|
|
|
- Unit test creation for bug reproduction
|
|
|
|
|
|
- Integration testing for component interactions
|
|
|
|
|
|
- Load testing for performance issues
|
|
|
|
|
|
- Security testing for vulnerability assessment
|
|
|
|
|
|
- Chaos engineering for resilience testing
|
|
|
|
|
|
|
|
|
|
|
|
## Deliverables
|
|
|
|
|
|
- **Investigation Report**: Comprehensive analysis of issue and findings
|
|
|
|
|
|
- **Root Cause Analysis**: Detailed five-whys analysis with evidence
|
|
|
|
|
|
- **Solution Documentation**: Fix implementation with rationale
|
|
|
|
|
|
- **Prevention Plan**: Measures to prevent similar issues
|
|
|
|
|
|
- **Monitoring Enhancements**: Improved detection and alerting
|
|
|
|
|
|
- **Lessons Learned**: Knowledge base updates and team learnings
|
|
|
|
|
|
|
|
|
|
|
|
## Output Locations
|
|
|
|
|
|
- **Incident Reports**: `.claudedocs/incidents/rca-{issue}-{timestamp}.md`
|
|
|
|
|
|
- **Investigation Logs**: `.claudedocs/reports/troubleshoot-{timestamp}.md`
|
|
|
|
|
|
- **Solution Documentation**: `.claudedocs/summaries/fix-{issue}-{timestamp}.md`
|
|
|
|
|
|
|
|
|
|
|
|
## Research Requirements
|
|
|
|
|
|
External_Library_Research:
|
|
|
|
|
|
- Identify library/framework mentioned
|
|
|
|
|
|
- Context7 lookup for official documentation
|
|
|
|
|
|
- Verify API patterns and examples
|
|
|
|
|
|
- Check version compatibility
|
|
|
|
|
|
- Document findings in implementation
|
|
|
|
|
|
Pattern_Research:
|
|
|
|
|
|
- Search existing codebase for similar patterns
|
|
|
|
|
|
- Magic component search if UI-related
|
|
|
|
|
|
- WebSearch for official documentation
|
|
|
|
|
|
- Validate approach with Sequential thinking
|
|
|
|
|
|
- Document pattern choice rationale
|
|
|
|
|
|
API_Integration_Research:
|
|
|
|
|
|
- Official documentation lookup
|
|
|
|
|
|
- Authentication requirements
|
|
|
|
|
|
- Rate limiting and error handling
|
|
|
|
|
|
- SDK availability and examples
|
|
|
|
|
|
- Integration testing approach
|
|
|
|
|
|
|
|
|
|
|
|
## Report Notifications
|
|
|
|
|
|
📄 Analysis report saved to: {path}
|
|
|
|
|
|
📊 Metrics updated: {path}
|
|
|
|
|
|
📋 Summary saved to: {path}
|
|
|
|
|
|
💾 Checkpoint created: {path}
|
|
|
|
|
|
📚 Documentation created: {path}
|
|
|
|
|
|
📁 Created directory: {path}
|
|
|
|
|
|
✅ {operation} completed successfully
|
|
|
|
|
|
❌ {operation} failed: {reason}
|
|
|
|
|
|
⚠ {operation} completed w/ warnings
|
|
|
|
|
|
|
|
|
|
|
|
## Best Practices
|
|
|
|
|
|
|
|
|
|
|
|
**Systematic Approach:**
|
|
|
|
|
|
- Follow structured troubleshooting methodology
|
|
|
|
|
|
- Document findings and decisions throughout process
|
|
|
|
|
|
- Maintain objectivity and avoid assumption-based debugging
|
|
|
|
|
|
- Use data and evidence to drive investigation
|
|
|
|
|
|
- Consider multiple hypotheses before settling on solutions
|
|
|
|
|
|
|
|
|
|
|
|
**Collaboration and Communication:**
|
|
|
|
|
|
- Involve relevant team members and stakeholders
|
|
|
|
|
|
- Communicate status and findings regularly
|
|
|
|
|
|
- Share knowledge and learnings with team
|
|
|
|
|
|
- Document solutions for future reference
|
|
|
|
|
|
- Conduct post-incident reviews for improvement
|
|
|
|
|
|
|
|
|
|
|
|
**Prevention Focus:**
|
|
|
|
|
|
- Address root causes, not just symptoms
|
|
|
|
|
|
- Implement monitoring and alerting improvements
|
|
|
|
|
|
- Update processes and procedures based on learnings
|
|
|
|
|
|
- Enhance testing and quality assurance practices
|
|
|
|
|
|
- Build resilience and error handling into systems
|
|
|
|
|
|
|
|
|
|
|
|
## Common Error Scenarios
|
|
|
|
|
|
|
|
|
|
|
|
### Database Connection Issues
|
|
|
|
|
|
```bash
|
|
|
|
|
|
/troubleshoot --investigate --dependencies "connection timeout"
|
|
|
|
|
|
# → Checks DB connectivity, credentials, network latency
|
|
|
|
|
|
# → Verifies connection pool settings and timeouts
|
|
|
|
|
|
# → Tests failover mechanisms and retry logic
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Memory Leaks & Performance
|
|
|
|
|
|
```bash
|
|
|
|
|
|
/troubleshoot --performance --logs --binary-search
|
|
|
|
|
|
# → Profiles memory usage patterns over time
|
|
|
|
|
|
# → Identifies allocation hotspots and retention issues
|
|
|
|
|
|
# → Implements heap dump analysis and GC tuning
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Production Emergencies
|
|
|
|
|
|
```bash
|
|
|
|
|
|
/troubleshoot --prod --investigate --timeline --critical
|
|
|
|
|
|
# → Creates incident timeline with system events
|
|
|
|
|
|
# → Preserves logs and state for post-mortem analysis
|
|
|
|
|
|
# → Implements safe rollback procedures if needed
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Integration & API Failures
|
|
|
|
|
|
```bash
|
|
|
|
|
|
/troubleshoot --dependencies --network --cross-env
|
|
|
|
|
|
# → Tests API endpoints and service dependencies
|
|
|
|
|
|
# → Validates authentication and authorization flows
|
|
|
|
|
|
# → Checks rate limiting and circuit breaker status
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Troubleshooting
|
|
|
|
|
|
- **Complex Issues**: Use `--ultrathink --five-whys` for comprehensive analysis
|
|
|
|
|
|
- **Production Emergencies**: Apply `--prod --investigate --timeline` for safe handling
|
|
|
|
|
|
- **Performance Problems**: Combine `--performance --logs --binary-search`
|
|
|
|
|
|
- **Integration Issues**: Use `--dependencies --network --cross-env`
|
2025-06-22 14:02:49 +02:00
|
|
|
|
|
2025-06-24 21:24:14 +02:00
|
|
|
|
## Success Messages
|
|
|
|
|
|
✅ {operation} completed successfully
|
|
|
|
|
|
📝 Created: {file_path}
|
|
|
|
|
|
✏ Updated: {file_path}
|
|
|
|
|
|
✨ Task completed: {task_title}
|