Files
SuperClaude/.claude/commands/shared/recovery-state-patterns.yml
NomenAK d24503ca02 refactor: Complete SuperClaude v2 migration with @include reference system
- Migrate all command files to use @include reference system
- Consolidate shared patterns into new yml structure
- Create central superclaude shared configuration files
- Remove deprecated markdown files (MCP.md, PERSONAS.md, RULES.md)
- Add new documentation structure in docs/
- Update installation script for new architecture
- Add ROADMAP.md and VERSION files

This completes the major architectural refactor to improve maintainability
and reduce duplication across the SuperClaude command system.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-25 16:51:53 +02:00

231 lines
8.4 KiB
YAML

# Recovery & State Management - Consolidated Patterns
# Comprehensive state preservation, session recovery, and error handling
## Legend
@include universal-constants.yml#Universal_Legend
## Checkpoint Architecture
```yaml
Checkpoint_Structure:
Storage_Hierarchy:
Base_Directory: ".claudedocs/checkpoints/"
Active_Checkpoint: ".claudedocs/checkpoints/active/"
Archive_Directory: ".claudedocs/checkpoints/archive/"
Recovery_Cache: ".claudedocs/checkpoints/recovery/"
Checkpoint_Components:
Metadata:
checkpoint_id: "UUID v4 format"
timestamp: "ISO 8601 with timezone"
operation: "Command that triggered checkpoint"
risk_level: "LOW|MEDIUM|HIGH|CRITICAL"
State_Preservation:
modified_files: "List → snapshots w/ content hash"
git_state: "Branch, commit SHA, uncommitted changes"
session_context: "Command history, todos, decisions, blockers"
environment: "Working dir, env vars, tool versions"
mcp_cache: "Active servers, cached responses, token usage"
Automatic_Triggers:
Risk_Based:
Critical: ["Database migrations", "Production deployments", "Security changes"]
High: ["Multi-file refactoring", "Dependency updates", "Architecture changes"]
Medium: ["Feature implementation", "Bug fixes affecting 3+ files"]
Time_Based:
Long_Operations: "Checkpoint every 10 minutes for ops >5 min"
Session_Duration: "Checkpoint every 30 minutes"
Command_Specific:
Always: ["/deploy", "/migrate", "/cleanup --aggressive"]
Conditional: ["/build → core systems", "/test → infrastructure"]
```
## Session Recovery Patterns
```yaml
Session_Detection:
Startup_Scan:
Locations: [".claudedocs/tasks/in-progress/", ".claudedocs/tasks/pending/"]
Parse_Metadata: "Extract task ID, title, status, branch, progress"
Git_Validation: "Check branch exists, clean working tree"
Context_Restoration: "Load session state, variables, decisions"
Recovery_Decision_Matrix:
Active_Tasks_Found: "Auto-resume most recent or highest priority"
Multiple_Tasks: "Present selection menu w/ progress summary"
Stale_Tasks: "Tasks >7 days old require confirmation"
Corrupted_State: "Fallback → manual recovery prompts"
Context_Recovery:
State_Reconstruction:
File_Context: "Restore working file list & modification tracking"
Variable_Context: "Reload session variables & configurations"
Decision_Context: "Restore architectural & implementation decisions"
Progress_Context: "Rebuild todo list from task breakdown & phase"
Automatic_Recovery:
Seamless_Resume: "Silent recovery for single active task"
Smart_Restoration: "Rebuild TodoWrite from task state & progress"
Blocker_Detection: "Identify & surface previous blockers"
Next_Step_ID: "Determine optimal next action"
session_state_format:
current_files: [path1, path2]
key_variables:
api_endpoint: "https://api.example.com"
database_name: "myapp_prod"
decisions:
- "Used React over Vue for team familiarity"
blockers:
- issue: "CORS error on API calls"
attempted: ["headers", "proxy setup"]
solution: "server-side CORS config needed"
```
## Error Classification & Recovery
```yaml
Error_Categories:
Transient_Errors:
Examples: ["Network timeouts", "Resource unavailable", "API rate limits"]
Recovery: "Exponential backoff retry (max 3 attempts)"
Environment_Errors:
Examples: ["Command not found", "Module not installed", "Permission denied"]
Recovery: "Guide user → environment fix"
Auto_Fix: "Attempt if possible"
Validation_Errors:
Examples: ["Invalid format", "Syntax errors", "Missing parameters"]
Recovery: "Request valid input w/ specific examples"
Logic_Errors:
Examples: ["Circular dependencies", "Conflicting requirements"]
Recovery: "Explain issue & offer alternatives"
Critical_Errors:
Examples: ["Data corruption", "Security violations", "System failures"]
Recovery: "Safe mode + checkpoint + user intervention"
Standard_Recovery_Patterns:
Retry_Strategies:
Exponential_Backoff:
Initial_Delay: 1000ms
Max_Delay: 30000ms
Multiplier: 2
Jitter: "Random 0-500ms"
Circuit_Breaker:
Failure_Threshold: 5
Reset_Timeout: 60000ms
Use_Case: "Protect failing services"
Fallback_Chains:
Tool_Failures: "grep → rg → native search → guide user"
MCP_Cascades:
Context7: "→ WebSearch → Local cache → Continue w/ warning"
Sequential: "→ Native analysis → Simplified → User input"
Magic: "→ Template generation → Manual guide → Examples"
Graceful_Degradation:
Feature_Reduction: "Complete → Basic → Minimal → Manual guide"
Performance_Adaptation: "Reduce scope → faster alternatives → progressive output"
```
## Recovery Implementation
```yaml
Recovery_Framework:
Error_Handler_Lifecycle:
Detect: "Catch → Classify → Assess severity → Capture context"
Analyze: "Determine recovery options → Check retry eligibility → ID fallbacks"
Respond: "Log w/ context → Attempt recovery → Track attempts → Escalate"
Report: "User-friendly message → Technical details → Actions → Next steps"
Progressive_Escalation:
Level_1: "Automatic retry/recovery"
Level_2: "Alternative approach"
Level_3: "Degraded operation"
Level_4: "Manual intervention"
Level_5: "Abort w/ rollback"
Recovery_Commands:
Checkpoint_Management:
Create: "/checkpoint [--full|--light] [--note 'message']"
List: "/checkpoint --list [--recent|--today|--filter]"
Inspect: "/checkpoint --inspect {id}"
Rollback: "/rollback [--quick|--full] [--files|--git|--env] [{id}]"
Recovery_Actions:
Quick_Rollback: "File contents only (sub-second)"
Full_Restoration: "Complete state (files, git, environment)"
Selective_Recovery: "Specific components only"
Progressive_Recovery: "Step through checkpoint history"
```
## Integration Patterns
```yaml
Command_Integration:
Lifecycle_Hooks:
Pre_Execution: "Create checkpoint before risky operations"
During_Execution: "Progressive updates for long operations"
Post_Execution: "Finalize or clean checkpoint based on outcome"
On_Failure: "Preserve checkpoint → offer immediate rollback"
Error_Context_Capture:
Operation: "What was being attempted"
Environment: "Working directory, user, permissions"
Input: "Command/parameters that failed"
Output: "Error messages & logs"
State: "System state when error occurred"
Common_Error_Scenarios:
Development:
Module_Not_Found: "Check package.json → npm install → verify import"
Build_Failures: "Clean cache → check syntax → verify deps → partial build"
Test_Failures: "Isolate failing → check environment → skip & continue"
System:
Permission_Denied: "chmod/chown suggestions → alt locations → user dir fallback"
Disk_Space: "Clean temp files → suggest cleanup → compression → minimal mode"
Memory_Exhaustion: "Reduce scope → streaming → batching → manual chunking"
Integration:
API_Failures: "401/403→auth | 429→rate limit | 500→retry | cached fallback"
Network_Issues: "Retry w/ backoff → alt endpoints → cached data → offline mode"
User_Experience:
Status_Messages:
Starting_Recovery: "🔄 Detecting previous session state..."
Task_Found: "📋 Found active task: {title} ({progress}% complete)"
Context_Restored: "✅ Session context restored - continuing work"
Recovery_Failed: "⚠ Manual recovery needed"
Error_Communication:
What: "Build failed: Module not found"
Why: "Package 'react' is not installed"
How: "Run: npm install react"
Then: "Retry the build command"
```
## Performance Optimization
```yaml
Efficiency_Strategies:
Incremental_Checkpoints: "Base checkpoint + delta updates"
Parallel_Processing: "Concurrent file backups & state collection"
Smart_Selection: "Only backup affected files → shallow copy unchanged"
Resource_Management: "Stream large files → limit CPU → low priority I/O"
Retention_Policy:
Automatic_Cleanup: "7 days age | 20 count limit | 1GB size limit"
Protected_Checkpoints: "User-marked → never auto-delete"
Archive_Strategy: "Tar.gz old checkpoints → metadata retention"
```
---
*Recovery & State Management v3 - Consolidated patterns for state preservation, session recovery, and error handling*