- Removed references to validate-references.sh from YAML files - Removed expand-references.sh from settings.local.json - Cleaned up @pattern/@flags references from shared files - Updated documentation to reflect current no-code implementation - Simplified reference-index.yml to remove @include patterns This cleanup removes confusion from the abandoned pattern reference system while maintaining all functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
14 KiB
/troubleshoot - Debug and resolve issues systematically
Legend
| Symbol | Meaning | Abbrev | Meaning | |
|---|---|---|---|---|
| → | leads to | cfg | configuration | |
| & | and/with | impl | implementation | |
| w/ | with | perf | performance | |
| @ | at/located | ops | operations | |
| > | greater than | val | validation | |
| ∀ | for all/every | req | requirements | |
| ∃ | exists/there is | deps | dependencies | |
| ∴ | therefore | env | environment | |
| ∵ | because | db | database | |
| ≡ | equivalent | api | interface | |
| ≈ | approximately | docs | documentation | |
| 📁 | directory/path | std | standard | |
| 🔢 | number/count | def | default | |
| 📝 | text/string | ctx | context | |
| ⚙ | setting/config | err | error | |
| 🎛 | control/flags | exec | execution | |
| 🔧 | configuration | qual | quality | |
| 📋 | group/category | rec | recovery | |
| 🚨 | critical/urgent | sev | severity | |
| ⚠ | warning/caution | resp | response | |
| 🔄 | retry/recovery | esc | escalation | |
| ✅ | success/fixed | tok | token | |
| ❌ | failure/error | opt | optimization | |
| ℹ | information | UX | user experience | |
| ⚡ | fast/quick | UI | user interface | |
| 🐌 | slow/delayed | C | critical | |
| ✨ | complete/done | H | high | |
| 📖 | read operation | M | medium | |
| ✏ | edit operation | L | low | |
| 🗑 | delete operation |
Purpose
Debug and resolve issues in code or systems specified in $ARGUMENTS using systematic troubleshooting methodologies and analysis techniques.
Syntax
/troubleshoot [flags] [issue-description]
Universal Flags
--plan: "Show execution plan before running" --uc: "UltraCompressed mode (~70% token reduction)" --ultracompressed: "Alias for --uc" --think: "Multi-file analysis w/ context (4K tokens)" --think-hard: "Deep architectural analysis (10K tokens)" --ultrathink: "Critical system redesign (32K tokens)" --c7: "Enable Context7→library documentation lookup" --seq: "Enable Sequential→complex analysis & thinking" --magic: "Enable Magic→UI component generation" --pup: "Enable Puppeteer→browser automation & testing" --all-mcp: "Enable all MCP servers" --no-mcp: "Disable all MCP servers (native tools only)" --no-c7: "Disable Context7 specifically" --no-seq: "Disable Sequential thinking specifically" --no-magic: "Disable Magic UI builder specifically" --no-pup: "Disable Puppeteer specifically"
Command-Specific Flags
Troubleshooting Modes:
--investigate: Focus on understanding and analyzing issues without immediate fixes--fix: Complete bug-fixing workflow with testing and verification (default)--analyze: Deep technical analysis of complex system interactions--diagnose: Systematic diagnostic approach with structured methodology
Analysis Methods:
--five-whys: Apply root cause analysis methodology iteratively--binary-search: Use binary search approach to isolate problem scope--timeline: Analyze issue timeline and recent changes--dependencies: Focus on dependency and integration issues
Environment Focus:
--prod: Production-specific issue handling with minimal disruption--staging: Staging environment debugging and testing--local: Local development environment troubleshooting--cross-env: Cross-environment consistency analysis
Investigation Tools:
--logs: Focus on log analysis and pattern detection--performance: Performance profiling and bottleneck analysis--security: Security-focused investigation and vulnerability analysis--network: Network connectivity and API integration debugging
Examples
/troubleshoot --investigate --logs --think→ Log analysis with context/troubleshoot --five-whys --prod --think-hard→ Production root cause analysis/troubleshoot --fix --performance --ultrathink→ Performance issue resolution/troubleshoot --binary-search --dependencies→ Systematic dependency debugging/troubleshoot --analyze --security --network→ Security and network analysis
Troubleshooting Workflow
1. Issue Reproduction and Understanding
- Minimal Reproduction: Create smallest possible reproduction case
- Behavior Documentation: Document expected vs actual behavior clearly
- Impact Assessment: Identify affected components, users, and business impact
- Severity Classification: Determine urgency and priority level
- Environment Analysis: Understand where and when the issue occurs
2. Investigation and Isolation
- Tool Utilization: Apply debugging tools and strategic logging
- Scope Narrowing: Use binary search to isolate problem area
- Change Analysis: Review recent changes using git history and blame
- Data Collection: Analyze logs, stack traces, and monitoring data
- Factor Elimination: Rule out environmental and configuration factors
3. Root Cause Analysis
- Underlying Causes: Look beyond symptoms to find root causes
- Five-Whys Method: Apply iterative questioning technique
- Systemic Analysis: Consider broader system and process issues
- Contributing Factors: Document all factors that led to the issue
- Pattern Recognition: Identify similar issues and common causes
4. Solution Development (--fix mode)
- Test Creation: Write failing test that reproduces the issue
- Minimal Fix: Implement focused solution addressing root cause
- Compatibility: Ensure backward compatibility and minimal disruption
- Edge Cases: Consider side effects and edge case scenarios
- Code Review: Apply standard code review and quality practices
5. Verification and Prevention
- Fix Validation: Verify solution resolves issue completely
- Regression Testing: Run full test suite to prevent regressions
- Realistic Testing: Test in production-like conditions
- Monitoring: Add monitoring and alerting for early detection
- Documentation: Record lessons learned and prevention measures
Investigation Techniques
Debugging Approaches:
- Strategic Logging: Add targeted logging at key decision points
- Breakpoint Analysis: Use debugger breakpoints for step-through analysis
- State Inspection: Examine variable states and data structures
- Call Stack Analysis: Trace execution paths and function calls
- Memory Debugging: Analyze memory usage and potential leaks
Performance Analysis:
- Profiling Tools: Use performance profilers for bottleneck identification
- Resource Monitoring: Track CPU, memory, and I/O usage patterns
- Query Analysis: Analyze database queries and execution plans
- Network Inspection: Monitor network requests and response times
- Caching Evaluation: Assess caching effectiveness and hit rates
System Analysis:
- Configuration Review: Examine system and application configurations
- Dependency Mapping: Map and analyze component dependencies
- Integration Testing: Test inter-service communication and APIs
- Infrastructure Analysis: Review infrastructure and deployment setup
- Security Assessment: Analyze security configurations and access controls
Five-Whys Analysis (--five-whys)
Methodology:
- Problem Statement: Clearly define the observed problem
- First Why: Why did this problem occur? (immediate cause)
- Second Why: Why did that cause occur? (deeper cause)
- Third Why: Why did that deeper cause occur? (root cause)
- Fourth Why: Why does that root cause exist? (systemic cause)
- Fifth Why: Why is that system in place? (organizational cause)
Documentation:
- Record each level of analysis with evidence
- Document contributing factors at each level
- Identify prevention measures for each cause
- Propose systemic improvements to prevent recurrence
- Create action items for short-term and long-term fixes
Production Issue Handling (--prod)
Production-Specific Considerations:
- Minimal Disruption: Prioritize system stability and user experience
- Rollback Readiness: Prepare immediate rollback options
- Monitoring Integration: Use existing monitoring and alerting systems
- Communication: Maintain stakeholder communication throughout
- Documentation: Record all changes and decisions for audit trail
Production Analysis:
- Deployment Correlation: Correlate issues with recent deployments
- Traffic Patterns: Analyze user traffic and usage patterns
- Configuration Changes: Review recent configuration modifications
- Resource Utilization: Monitor system resource usage and limits
- Service Dependencies: Check health of dependent services
Safe Production Practices:
- Feature Flags: Use feature toggles to isolate problematic features
- Gradual Rollout: Implement fixes gradually with monitoring
- A/B Testing: Compare fix effectiveness with control groups
- Circuit Breakers: Implement circuit breakers for failing services
- Health Checks: Continuous health monitoring during fixes
Investigation Tools and Techniques
Logging and Monitoring:
- Centralized log aggregation and analysis
- Real-time monitoring dashboards and alerts
- Distributed tracing for microservices
- Application performance monitoring (APM)
- Custom metrics and business intelligence
Debugging Tools:
- Interactive debuggers and IDE integration
- Remote debugging capabilities
- Memory profilers and leak detectors
- Performance profiling tools
- Network traffic analyzers
Testing and Validation:
- Unit test creation for bug reproduction
- Integration testing for component interactions
- Load testing for performance issues
- Security testing for vulnerability assessment
- Chaos engineering for resilience testing
Deliverables
- Investigation Report: Comprehensive analysis of issue and findings
- Root Cause Analysis: Detailed five-whys analysis with evidence
- Solution Documentation: Fix implementation with rationale
- Prevention Plan: Measures to prevent similar issues
- Monitoring Enhancements: Improved detection and alerting
- Lessons Learned: Knowledge base updates and team learnings
Output Locations
- Incident Reports:
.claudedocs/incidents/rca-{issue}-{timestamp}.md - Investigation Logs:
.claudedocs/reports/troubleshoot-{timestamp}.md - Solution Documentation:
.claudedocs/summaries/fix-{issue}-{timestamp}.md
Research Requirements
External_Library_Research:
- Identify library/framework mentioned
- Context7 lookup for official documentation
- Verify API patterns and examples
- Check version compatibility
- Document findings in implementation Pattern_Research:
- Search existing codebase for similar patterns
- Magic component search if UI-related
- WebSearch for official documentation
- Validate approach with Sequential thinking
- Document pattern choice rationale API_Integration_Research:
- Official documentation lookup
- Authentication requirements
- Rate limiting and error handling
- SDK availability and examples
- Integration testing approach
Report Notifications
📄 Analysis report saved to: {path} 📊 Metrics updated: {path} 📋 Summary saved to: {path} 💾 Checkpoint created: {path} 📚 Documentation created: {path} 📁 Created directory: {path} ✅ {operation} completed successfully ❌ {operation} failed: {reason} ⚠ {operation} completed w/ warnings
Best Practices
Systematic Approach:
- Follow structured troubleshooting methodology
- Document findings and decisions throughout process
- Maintain objectivity and avoid assumption-based debugging
- Use data and evidence to drive investigation
- Consider multiple hypotheses before settling on solutions
Collaboration and Communication:
- Involve relevant team members and stakeholders
- Communicate status and findings regularly
- Share knowledge and learnings with team
- Document solutions for future reference
- Conduct post-incident reviews for improvement
Prevention Focus:
- Address root causes, not just symptoms
- Implement monitoring and alerting improvements
- Update processes and procedures based on learnings
- Enhance testing and quality assurance practices
- Build resilience and error handling into systems
Common Error Scenarios
Database Connection Issues
/troubleshoot --investigate --dependencies "connection timeout"
# → Checks DB connectivity, credentials, network latency
# → Verifies connection pool settings and timeouts
# → Tests failover mechanisms and retry logic
Memory Leaks & Performance
/troubleshoot --performance --logs --binary-search
# → Profiles memory usage patterns over time
# → Identifies allocation hotspots and retention issues
# → Implements heap dump analysis and GC tuning
Production Emergencies
/troubleshoot --prod --investigate --timeline --critical
# → Creates incident timeline with system events
# → Preserves logs and state for post-mortem analysis
# → Implements safe rollback procedures if needed
Integration & API Failures
/troubleshoot --dependencies --network --cross-env
# → Tests API endpoints and service dependencies
# → Validates authentication and authorization flows
# → Checks rate limiting and circuit breaker status
Troubleshooting
- Complex Issues: Use
--ultrathink --five-whysfor comprehensive analysis - Production Emergencies: Apply
--prod --investigate --timelinefor safe handling - Performance Problems: Combine
--performance --logs --binary-search - Integration Issues: Use
--dependencies --network --cross-env
Success Messages
✅ {operation} completed successfully 📝 Created: {file_path} ✏ Updated: {file_path} ✨ Task completed: {task_title}