Files
SuperClaude/.claude/commands/troubleshoot.md
NomenAK 9c3608a783 Clean up references to deleted scripts and pattern system
- Removed references to validate-references.sh from YAML files
- Removed expand-references.sh from settings.local.json
- Cleaned up @pattern/@flags references from shared files
- Updated documentation to reflect current no-code implementation
- Simplified reference-index.yml to remove @include patterns

This cleanup removes confusion from the abandoned pattern reference
system while maintaining all functionality.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-24 21:24:14 +02:00

14 KiB
Raw Blame History

/troubleshoot - Debug and resolve issues systematically

Legend

Symbol Meaning Abbrev Meaning
leads to cfg configuration
& and/with impl implementation
w/ with perf performance
@ at/located ops operations
> greater than val validation
for all/every req requirements
exists/there is deps dependencies
therefore env environment
because db database
equivalent api interface
approximately docs documentation
📁 directory/path std standard
🔢 number/count def default
📝 text/string ctx context
setting/config err error
🎛 control/flags exec execution
🔧 configuration qual quality
📋 group/category rec recovery
🚨 critical/urgent sev severity
warning/caution resp response
🔄 retry/recovery esc escalation
success/fixed tok token
failure/error opt optimization
information UX user experience
fast/quick UI user interface
🐌 slow/delayed C critical
complete/done H high
📖 read operation M medium
edit operation L low
🗑 delete operation

Purpose

Debug and resolve issues in code or systems specified in $ARGUMENTS using systematic troubleshooting methodologies and analysis techniques.

Syntax

/troubleshoot [flags] [issue-description]

Universal Flags

--plan: "Show execution plan before running" --uc: "UltraCompressed mode (~70% token reduction)" --ultracompressed: "Alias for --uc" --think: "Multi-file analysis w/ context (4K tokens)" --think-hard: "Deep architectural analysis (10K tokens)" --ultrathink: "Critical system redesign (32K tokens)" --c7: "Enable Context7→library documentation lookup" --seq: "Enable Sequential→complex analysis & thinking" --magic: "Enable Magic→UI component generation" --pup: "Enable Puppeteer→browser automation & testing" --all-mcp: "Enable all MCP servers" --no-mcp: "Disable all MCP servers (native tools only)" --no-c7: "Disable Context7 specifically" --no-seq: "Disable Sequential thinking specifically" --no-magic: "Disable Magic UI builder specifically" --no-pup: "Disable Puppeteer specifically"

Command-Specific Flags

Troubleshooting Modes:

  • --investigate: Focus on understanding and analyzing issues without immediate fixes
  • --fix: Complete bug-fixing workflow with testing and verification (default)
  • --analyze: Deep technical analysis of complex system interactions
  • --diagnose: Systematic diagnostic approach with structured methodology

Analysis Methods:

  • --five-whys: Apply root cause analysis methodology iteratively
  • --binary-search: Use binary search approach to isolate problem scope
  • --timeline: Analyze issue timeline and recent changes
  • --dependencies: Focus on dependency and integration issues

Environment Focus:

  • --prod: Production-specific issue handling with minimal disruption
  • --staging: Staging environment debugging and testing
  • --local: Local development environment troubleshooting
  • --cross-env: Cross-environment consistency analysis

Investigation Tools:

  • --logs: Focus on log analysis and pattern detection
  • --performance: Performance profiling and bottleneck analysis
  • --security: Security-focused investigation and vulnerability analysis
  • --network: Network connectivity and API integration debugging

Examples

  • /troubleshoot --investigate --logs --think → Log analysis with context
  • /troubleshoot --five-whys --prod --think-hard → Production root cause analysis
  • /troubleshoot --fix --performance --ultrathink → Performance issue resolution
  • /troubleshoot --binary-search --dependencies → Systematic dependency debugging
  • /troubleshoot --analyze --security --network → Security and network analysis

Troubleshooting Workflow

1. Issue Reproduction and Understanding

  • Minimal Reproduction: Create smallest possible reproduction case
  • Behavior Documentation: Document expected vs actual behavior clearly
  • Impact Assessment: Identify affected components, users, and business impact
  • Severity Classification: Determine urgency and priority level
  • Environment Analysis: Understand where and when the issue occurs

2. Investigation and Isolation

  • Tool Utilization: Apply debugging tools and strategic logging
  • Scope Narrowing: Use binary search to isolate problem area
  • Change Analysis: Review recent changes using git history and blame
  • Data Collection: Analyze logs, stack traces, and monitoring data
  • Factor Elimination: Rule out environmental and configuration factors

3. Root Cause Analysis

  • Underlying Causes: Look beyond symptoms to find root causes
  • Five-Whys Method: Apply iterative questioning technique
  • Systemic Analysis: Consider broader system and process issues
  • Contributing Factors: Document all factors that led to the issue
  • Pattern Recognition: Identify similar issues and common causes

4. Solution Development (--fix mode)

  • Test Creation: Write failing test that reproduces the issue
  • Minimal Fix: Implement focused solution addressing root cause
  • Compatibility: Ensure backward compatibility and minimal disruption
  • Edge Cases: Consider side effects and edge case scenarios
  • Code Review: Apply standard code review and quality practices

5. Verification and Prevention

  • Fix Validation: Verify solution resolves issue completely
  • Regression Testing: Run full test suite to prevent regressions
  • Realistic Testing: Test in production-like conditions
  • Monitoring: Add monitoring and alerting for early detection
  • Documentation: Record lessons learned and prevention measures

Investigation Techniques

Debugging Approaches:

  • Strategic Logging: Add targeted logging at key decision points
  • Breakpoint Analysis: Use debugger breakpoints for step-through analysis
  • State Inspection: Examine variable states and data structures
  • Call Stack Analysis: Trace execution paths and function calls
  • Memory Debugging: Analyze memory usage and potential leaks

Performance Analysis:

  • Profiling Tools: Use performance profilers for bottleneck identification
  • Resource Monitoring: Track CPU, memory, and I/O usage patterns
  • Query Analysis: Analyze database queries and execution plans
  • Network Inspection: Monitor network requests and response times
  • Caching Evaluation: Assess caching effectiveness and hit rates

System Analysis:

  • Configuration Review: Examine system and application configurations
  • Dependency Mapping: Map and analyze component dependencies
  • Integration Testing: Test inter-service communication and APIs
  • Infrastructure Analysis: Review infrastructure and deployment setup
  • Security Assessment: Analyze security configurations and access controls

Five-Whys Analysis (--five-whys)

Methodology:

  1. Problem Statement: Clearly define the observed problem
  2. First Why: Why did this problem occur? (immediate cause)
  3. Second Why: Why did that cause occur? (deeper cause)
  4. Third Why: Why did that deeper cause occur? (root cause)
  5. Fourth Why: Why does that root cause exist? (systemic cause)
  6. Fifth Why: Why is that system in place? (organizational cause)

Documentation:

  • Record each level of analysis with evidence
  • Document contributing factors at each level
  • Identify prevention measures for each cause
  • Propose systemic improvements to prevent recurrence
  • Create action items for short-term and long-term fixes

Production Issue Handling (--prod)

Production-Specific Considerations:

  • Minimal Disruption: Prioritize system stability and user experience
  • Rollback Readiness: Prepare immediate rollback options
  • Monitoring Integration: Use existing monitoring and alerting systems
  • Communication: Maintain stakeholder communication throughout
  • Documentation: Record all changes and decisions for audit trail

Production Analysis:

  • Deployment Correlation: Correlate issues with recent deployments
  • Traffic Patterns: Analyze user traffic and usage patterns
  • Configuration Changes: Review recent configuration modifications
  • Resource Utilization: Monitor system resource usage and limits
  • Service Dependencies: Check health of dependent services

Safe Production Practices:

  • Feature Flags: Use feature toggles to isolate problematic features
  • Gradual Rollout: Implement fixes gradually with monitoring
  • A/B Testing: Compare fix effectiveness with control groups
  • Circuit Breakers: Implement circuit breakers for failing services
  • Health Checks: Continuous health monitoring during fixes

Investigation Tools and Techniques

Logging and Monitoring:

  • Centralized log aggregation and analysis
  • Real-time monitoring dashboards and alerts
  • Distributed tracing for microservices
  • Application performance monitoring (APM)
  • Custom metrics and business intelligence

Debugging Tools:

  • Interactive debuggers and IDE integration
  • Remote debugging capabilities
  • Memory profilers and leak detectors
  • Performance profiling tools
  • Network traffic analyzers

Testing and Validation:

  • Unit test creation for bug reproduction
  • Integration testing for component interactions
  • Load testing for performance issues
  • Security testing for vulnerability assessment
  • Chaos engineering for resilience testing

Deliverables

  • Investigation Report: Comprehensive analysis of issue and findings
  • Root Cause Analysis: Detailed five-whys analysis with evidence
  • Solution Documentation: Fix implementation with rationale
  • Prevention Plan: Measures to prevent similar issues
  • Monitoring Enhancements: Improved detection and alerting
  • Lessons Learned: Knowledge base updates and team learnings

Output Locations

  • Incident Reports: .claudedocs/incidents/rca-{issue}-{timestamp}.md
  • Investigation Logs: .claudedocs/reports/troubleshoot-{timestamp}.md
  • Solution Documentation: .claudedocs/summaries/fix-{issue}-{timestamp}.md

Research Requirements

External_Library_Research:

  • Identify library/framework mentioned
  • Context7 lookup for official documentation
  • Verify API patterns and examples
  • Check version compatibility
  • Document findings in implementation Pattern_Research:
  • Search existing codebase for similar patterns
  • Magic component search if UI-related
  • WebSearch for official documentation
  • Validate approach with Sequential thinking
  • Document pattern choice rationale API_Integration_Research:
  • Official documentation lookup
  • Authentication requirements
  • Rate limiting and error handling
  • SDK availability and examples
  • Integration testing approach

Report Notifications

📄 Analysis report saved to: {path} 📊 Metrics updated: {path} 📋 Summary saved to: {path} 💾 Checkpoint created: {path} 📚 Documentation created: {path} 📁 Created directory: {path} {operation} completed successfully {operation} failed: {reason} ⚠ {operation} completed w/ warnings

Best Practices

Systematic Approach:

  • Follow structured troubleshooting methodology
  • Document findings and decisions throughout process
  • Maintain objectivity and avoid assumption-based debugging
  • Use data and evidence to drive investigation
  • Consider multiple hypotheses before settling on solutions

Collaboration and Communication:

  • Involve relevant team members and stakeholders
  • Communicate status and findings regularly
  • Share knowledge and learnings with team
  • Document solutions for future reference
  • Conduct post-incident reviews for improvement

Prevention Focus:

  • Address root causes, not just symptoms
  • Implement monitoring and alerting improvements
  • Update processes and procedures based on learnings
  • Enhance testing and quality assurance practices
  • Build resilience and error handling into systems

Common Error Scenarios

Database Connection Issues

/troubleshoot --investigate --dependencies "connection timeout"
# → Checks DB connectivity, credentials, network latency
# → Verifies connection pool settings and timeouts
# → Tests failover mechanisms and retry logic

Memory Leaks & Performance

/troubleshoot --performance --logs --binary-search
# → Profiles memory usage patterns over time
# → Identifies allocation hotspots and retention issues
# → Implements heap dump analysis and GC tuning

Production Emergencies

/troubleshoot --prod --investigate --timeline --critical
# → Creates incident timeline with system events
# → Preserves logs and state for post-mortem analysis
# → Implements safe rollback procedures if needed

Integration & API Failures

/troubleshoot --dependencies --network --cross-env
# → Tests API endpoints and service dependencies
# → Validates authentication and authorization flows
# → Checks rate limiting and circuit breaker status

Troubleshooting

  • Complex Issues: Use --ultrathink --five-whys for comprehensive analysis
  • Production Emergencies: Apply --prod --investigate --timeline for safe handling
  • Performance Problems: Combine --performance --logs --binary-search
  • Integration Issues: Use --dependencies --network --cross-env

Success Messages

{operation} completed successfully 📝 Created: {file_path} ✏ Updated: {file_path} Task completed: {task_title}