# /troubleshoot - Debug and resolve issues systematically

## Legend
| Symbol | Meaning | | Abbrev | Meaning |
|--------|---------|---|--------|---------|
| → | leads to | | cfg | configuration |
| & | and/with | | impl | implementation |
| w/ | with | | perf | performance |
| @ | at/located | | ops | operations |
| > | greater than | | val | validation |
| ∀ | for all/every | | req | requirements |
| ∃ | exists/there is | | deps | dependencies |
| ∴ | therefore | | env | environment |
| ∵ | because | | db | database |
| ≡ | equivalent | | api | interface |
| ≈ | approximately | | docs | documentation |
| 📁 | directory/path | | std | standard |
| 🔢 | number/count | | def | default |
| 📝 | text/string | | ctx | context |
| ⚙ | setting/config | | err | error |
| 🎛 | control/flags | | exec | execution |
| 🔧 | configuration | | qual | quality |
| 📋 | group/category | | rec | recovery |
| 🚨 | critical/urgent | | sev | severity |
| ⚠ | warning/caution | | resp | response |
| 🔄 | retry/recovery | | esc | escalation |
| ✅ | success/fixed | | tok | token |
| ❌ | failure/error | | opt | optimization |
| ℹ | information | | UX | user experience |
| ⚡ | fast/quick | | UI | user interface |
| 🐌 | slow/delayed | | C | critical |
| ✨ | complete/done | | H | high |
| 📖 | read operation | | M | medium |
| ✏ | edit operation | | L | low |
| 🗑 | delete operation | | |

## Purpose
Debug and resolve issues in code or systems specified in $ARGUMENTS using systematic troubleshooting methodologies and analysis techniques.

## Syntax
`/troubleshoot [flags] [issue-description]`

## Universal Flags
--plan: "Show execution plan before running"
--uc: "UltraCompressed mode (~70% token reduction)"
--ultracompressed: "Alias for --uc"
--think: "Multi-file analysis w/ context (4K tokens)"
--think-hard: "Deep architectural analysis (10K tokens)"
--ultrathink: "Critical system redesign (32K tokens)"
--c7: "Enable Context7→library documentation lookup"
--seq: "Enable Sequential→complex analysis & thinking"
--magic: "Enable Magic→UI component generation"
--pup: "Enable Puppeteer→browser automation & testing"
--all-mcp: "Enable all MCP servers"
--no-mcp: "Disable all MCP servers (native tools only)"
--no-c7: "Disable Context7 specifically"
--no-seq: "Disable Sequential thinking specifically"
--no-magic: "Disable Magic UI builder specifically"
--no-pup: "Disable Puppeteer specifically"

## Command-Specific Flags

**Troubleshooting Modes:**
- `--investigate`: Focus on understanding and analyzing issues without immediate fixes
- `--fix`: Complete bug-fixing workflow with testing and verification (default)
- `--analyze`: Deep technical analysis of complex system interactions
- `--diagnose`: Systematic diagnostic approach with structured methodology

**Analysis Methods:**
- `--five-whys`: Apply root cause analysis methodology iteratively
- `--binary-search`: Use binary search approach to isolate problem scope
- `--timeline`: Analyze issue timeline and recent changes
- `--dependencies`: Focus on dependency and integration issues

**Environment Focus:**
- `--prod`: Production-specific issue handling with minimal disruption
- `--staging`: Staging environment debugging and testing
- `--local`: Local development environment troubleshooting
- `--cross-env`: Cross-environment consistency analysis

**Investigation Tools:**
- `--logs`: Focus on log analysis and pattern detection
- `--performance`: Performance profiling and bottleneck analysis
- `--security`: Security-focused investigation and vulnerability analysis
- `--network`: Network connectivity and API integration debugging

## Examples
- `/troubleshoot --investigate --logs --think` → Log analysis with context
- `/troubleshoot --five-whys --prod --think-hard` → Production root cause analysis
- `/troubleshoot --fix --performance --ultrathink` → Performance issue resolution
- `/troubleshoot --binary-search --dependencies` → Systematic dependency debugging
- `/troubleshoot --analyze --security --network` → Security and network analysis

## Troubleshooting Workflow

**1. Issue Reproduction and Understanding**
- **Minimal Reproduction**: Create smallest possible reproduction case
- **Behavior Documentation**: Document expected vs actual behavior clearly
- **Impact Assessment**: Identify affected components, users, and business impact
- **Severity Classification**: Determine urgency and priority level
- **Environment Analysis**: Understand where and when the issue occurs

**2. Investigation and Isolation**
- **Tool Utilization**: Apply debugging tools and strategic logging
- **Scope Narrowing**: Use binary search to isolate problem area
- **Change Analysis**: Review recent changes using git history and blame
- **Data Collection**: Analyze logs, stack traces, and monitoring data
- **Factor Elimination**: Rule out environmental and configuration factors

**3. Root Cause Analysis**
- **Underlying Causes**: Look beyond symptoms to find root causes
- **Five-Whys Method**: Apply iterative questioning technique
- **Systemic Analysis**: Consider broader system and process issues
- **Contributing Factors**: Document all factors that led to the issue
- **Pattern Recognition**: Identify similar issues and common causes

**4. Solution Development (--fix mode)**
- **Test Creation**: Write failing test that reproduces the issue
- **Minimal Fix**: Implement focused solution addressing root cause
- **Compatibility**: Ensure backward compatibility and minimal disruption
- **Edge Cases**: Consider side effects and edge case scenarios
- **Code Review**: Apply standard code review and quality practices

**5. Verification and Prevention**
- **Fix Validation**: Verify solution resolves issue completely
- **Regression Testing**: Run full test suite to prevent regressions
- **Realistic Testing**: Test in production-like conditions
- **Monitoring**: Add monitoring and alerting for early detection
- **Documentation**: Record lessons learned and prevention measures

## Investigation Techniques

**Debugging Approaches:**
- **Strategic Logging**: Add targeted logging at key decision points
- **Breakpoint Analysis**: Use debugger breakpoints for step-through analysis
- **State Inspection**: Examine variable states and data structures
- **Call Stack Analysis**: Trace execution paths and function calls
- **Memory Debugging**: Analyze memory usage and potential leaks

**Performance Analysis:**
- **Profiling Tools**: Use performance profilers for bottleneck identification
- **Resource Monitoring**: Track CPU, memory, and I/O usage patterns
- **Query Analysis**: Analyze database queries and execution plans
- **Network Inspection**: Monitor network requests and response times
- **Caching Evaluation**: Assess caching effectiveness and hit rates

**System Analysis:**
- **Configuration Review**: Examine system and application configurations
- **Dependency Mapping**: Map and analyze component dependencies
- **Integration Testing**: Test inter-service communication and APIs
- **Infrastructure Analysis**: Review infrastructure and deployment setup
- **Security Assessment**: Analyze security configurations and access controls

## Five-Whys Analysis (--five-whys)

**Methodology:**
1. **Problem Statement**: Clearly define the observed problem
2. **First Why**: Why did this problem occur? (immediate cause)
3. **Second Why**: Why did that cause occur? (deeper cause)
4. **Third Why**: Why did that deeper cause occur? (root cause)
5. **Fourth Why**: Why does that root cause exist? (systemic cause)
6. **Fifth Why**: Why is that system in place? (organizational cause)

**Documentation:**
- Record each level of analysis with evidence
- Document contributing factors at each level
- Identify prevention measures for each cause
- Propose systemic improvements to prevent recurrence
- Create action items for short-term and long-term fixes

## Production Issue Handling (--prod)

**Production-Specific Considerations:**
- **Minimal Disruption**: Prioritize system stability and user experience
- **Rollback Readiness**: Prepare immediate rollback options
- **Monitoring Integration**: Use existing monitoring and alerting systems
- **Communication**: Maintain stakeholder communication throughout
- **Documentation**: Record all changes and decisions for audit trail

**Production Analysis:**
- **Deployment Correlation**: Correlate issues with recent deployments
- **Traffic Patterns**: Analyze user traffic and usage patterns
- **Configuration Changes**: Review recent configuration modifications
- **Resource Utilization**: Monitor system resource usage and limits
- **Service Dependencies**: Check health of dependent services

**Safe Production Practices:**
- **Feature Flags**: Use feature toggles to isolate problematic features
- **Gradual Rollout**: Implement fixes gradually with monitoring
- **A/B Testing**: Compare fix effectiveness with control groups
- **Circuit Breakers**: Implement circuit breakers for failing services
- **Health Checks**: Continuous health monitoring during fixes

## Investigation Tools and Techniques

**Logging and Monitoring:**
- Centralized log aggregation and analysis
- Real-time monitoring dashboards and alerts
- Distributed tracing for microservices
- Application performance monitoring (APM)
- Custom metrics and business intelligence

**Debugging Tools:**
- Interactive debuggers and IDE integration
- Remote debugging capabilities
- Memory profilers and leak detectors
- Performance profiling tools
- Network traffic analyzers

**Testing and Validation:**
- Unit test creation for bug reproduction
- Integration testing for component interactions
- Load testing for performance issues
- Security testing for vulnerability assessment
- Chaos engineering for resilience testing

## Deliverables
- **Investigation Report**: Comprehensive analysis of issue and findings
- **Root Cause Analysis**: Detailed five-whys analysis with evidence
- **Solution Documentation**: Fix implementation with rationale
- **Prevention Plan**: Measures to prevent similar issues
- **Monitoring Enhancements**: Improved detection and alerting
- **Lessons Learned**: Knowledge base updates and team learnings

## Output Locations
- **Incident Reports**: `.claudedocs/incidents/rca-{issue}-{timestamp}.md`
- **Investigation Logs**: `.claudedocs/reports/troubleshoot-{timestamp}.md`
- **Solution Documentation**: `.claudedocs/summaries/fix-{issue}-{timestamp}.md`

## Research Requirements
External_Library_Research:
  - Identify library/framework mentioned
  - Context7 lookup for official documentation
  - Verify API patterns and examples
  - Check version compatibility
  - Document findings in implementation
Pattern_Research:
  - Search existing codebase for similar patterns
  - Magic component search if UI-related
  - WebSearch for official documentation
  - Validate approach with Sequential thinking
  - Document pattern choice rationale
API_Integration_Research:
  - Official documentation lookup
  - Authentication requirements
  - Rate limiting and error handling
  - SDK availability and examples
  - Integration testing approach

## Report Notifications
📄 Analysis report saved to: {path}
📊 Metrics updated: {path}
📋 Summary saved to: {path}
💾 Checkpoint created: {path}
📚 Documentation created: {path}
📁 Created directory: {path}
✅ {operation} completed successfully
❌ {operation} failed: {reason}
⚠ {operation} completed w/ warnings

## Best Practices

**Systematic Approach:**
- Follow structured troubleshooting methodology
- Document findings and decisions throughout process
- Maintain objectivity and avoid assumption-based debugging
- Use data and evidence to drive investigation
- Consider multiple hypotheses before settling on solutions

**Collaboration and Communication:**
- Involve relevant team members and stakeholders
- Communicate status and findings regularly
- Share knowledge and learnings with team
- Document solutions for future reference
- Conduct post-incident reviews for improvement

**Prevention Focus:**
- Address root causes, not just symptoms
- Implement monitoring and alerting improvements
- Update processes and procedures based on learnings
- Enhance testing and quality assurance practices
- Build resilience and error handling into systems

## Common Error Scenarios

### Database Connection Issues
```bash
/troubleshoot --investigate --dependencies "connection timeout"
# → Checks DB connectivity, credentials, network latency
# → Verifies connection pool settings and timeouts
# → Tests failover mechanisms and retry logic
```

### Memory Leaks & Performance
```bash
/troubleshoot --performance --logs --binary-search
# → Profiles memory usage patterns over time
# → Identifies allocation hotspots and retention issues
# → Implements heap dump analysis and GC tuning
```

### Production Emergencies
```bash
/troubleshoot --prod --investigate --timeline --critical
# → Creates incident timeline with system events
# → Preserves logs and state for post-mortem analysis
# → Implements safe rollback procedures if needed
```

### Integration & API Failures
```bash
/troubleshoot --dependencies --network --cross-env
# → Tests API endpoints and service dependencies
# → Validates authentication and authorization flows
# → Checks rate limiting and circuit breaker status
```

## Troubleshooting
- **Complex Issues**: Use `--ultrathink --five-whys` for comprehensive analysis
- **Production Emergencies**: Apply `--prod --investigate --timeline` for safe handling
- **Performance Problems**: Combine `--performance --logs --binary-search`
- **Integration Issues**: Use `--dependencies --network --cross-env`

## Success Messages
✅ {operation} completed successfully
📝 Created: {file_path}
✏ Updated: {file_path}
✨ Task completed: {task_title}