Files
SuperClaude/docs/research/research_repository_scoped_memory_2025-10-16.md
kazuki ce51fb512b refactor: consolidate documentation directories
Merged claudedocs/ into docs/research/ for consistent documentation structure.

Changes:
- Moved all claudedocs/*.md files to docs/research/
- Updated all path references in documentation (EN/KR)
- Updated RULES.md and research.md command templates
- Removed claudedocs/ directory
- Removed ClaudeDocs/ from .gitignore

Benefits:
- Single source of truth for all research reports
- PEP8-compliant lowercase directory naming
- Clearer documentation organization
- Prevents future claudedocs/ directory creation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 04:51:46 +09:00

17 KiB

Repository-Scoped Memory Management for AI Coding Assistants

Research Report | 2025-10-16

Executive Summary

This research investigates best practices for implementing repository-scoped memory management in AI coding assistants, with specific focus on SuperClaude PM Agent integration. Key findings indicate that local file storage with git repository detection is the industry standard for session isolation, offering optimal performance and developer experience.

Key Recommendations for SuperClaude

  1. Adopt Local File Storage: Store memory in repository-specific directories (.superclaude/memory/ or docs/memory/)
  2. Use Git Detection: Implement git rev-parse --git-dir for repository boundary detection
  3. Prioritize Simplicity: Start with file-based approach before considering databases
  4. Maintain Backward Compatibility: Support future cross-repository intelligence as optional feature

1. Industry Best Practices

1.1 Cursor IDE Memory Architecture

Implementation Pattern:

project-root/
├── .cursor/
│   └── rules/           # Project-specific configuration
├── .git/                # Repository boundary marker
└── memory-bank/         # Session context storage
    ├── project_context.md
    ├── progress_history.md
    └── architectural_decisions.md

Key Insights:

  • Repository-level isolation using .cursor/rules directory
  • Memory Bank pattern: structured knowledge repository for cross-session context
  • MCP integration (Graphiti) for sophisticated memory management across sessions
  • Problem: Users report context loss mid-task and excessive "start new chat" prompts

Relevance to SuperClaude: Validates local directory approach with repository-scoped configuration.


1.2 GitHub Copilot Workspace Context

Implementation Pattern:

  • Remote code search indexes for GitHub/Azure DevOps repositories
  • Local indexes for non-cloud repositories (limit: 2,500 files)
  • Respects .gitignore for index exclusion
  • Workspace-level context with repository-specific boundaries

Key Insights:

  • Automatic index building for GitHub-backed repos
  • .gitignore integration prevents sensitive data indexing
  • Repository authorization through GitHub App permissions
  • Limitation: Context scope is workspace-wide, not repository-specific by default

Relevance to SuperClaude: .gitignore integration is critical for security and performance.


1.3 Session Isolation Best Practices

Git Worktrees for Parallel Sessions:

# Enable multiple isolated Claude sessions
git worktree add ../feature-branch feature-branch
# Each worktree has independent working directory, shared git history

Context Window Management:

  • Long sessions lead to context pollution → performance degradation
  • Best Practice: Use /clear command between tasks
  • Create session-end context files (GEMINI.md, CONTEXT.md) for handoff
  • Break tasks into smaller, isolated chunks

Enterprise Security Architecture (4-Layer Defense):

  1. Prevention: Rate-limit access, auto-strip credentials
  2. Protection: Encryption, project-level role-based access control
  3. Detection: SAST/DAST/SCA on pull requests
  4. Response: Detailed commit-prompt mapping

Relevance to SuperClaude: PM Agent should implement context reset between repository changes.


2. Git Repository Detection Patterns

2.1 Standard Detection Methods

Recommended Approach:

# Detect if current directory is in git repository
git rev-parse --git-dir

# Check if inside working tree
git rev-parse --is-inside-work-tree

# Get repository root
git rev-parse --show-toplevel

Implementation Considerations:

  • Git searches parent directories for .git folder automatically
  • libgit2 library recommended for programmatic access
  • Avoid direct .git folder parsing (fragile to git internals changes)

2.2 Security Concerns

  • Issue: Millions of .git folders exposed publicly by misconfiguration
  • Mitigation: Always respect .gitignore and add .superclaude/ to ignore patterns
  • Best Practice: Store sensitive memory data in gitignored directories

3. Storage Architecture Comparison

3.1 Local File Storage

Advantages:

  • Performance: Faster than databases for sequential reads
  • Simplicity: No database setup or maintenance
  • Portability: Works offline, no network dependencies
  • Developer-Friendly: Files are readable/editable by humans
  • Git Integration: Can be versioned (if desired) or gitignored

Disadvantages:

  • No ACID transactions
  • Limited query capabilities
  • Manual concurrency handling

Use Cases:

  • Perfect for: Session context, architectural decisions, project documentation
  • Not ideal for: High-concurrency writes, complex queries

3.2 Database Storage

Advantages:

  • ACID transactions
  • Complex queries (SQL)
  • Concurrency management
  • Scalability for cross-repository intelligence (future)

Disadvantages:

  • Performance: Slower than local files for simple reads
  • Complexity: Database setup and maintenance overhead
  • Network Bottlenecks: If using remote database
  • Developer UX: Requires database tools to inspect

Use Cases:

  • Future feature: Cross-repository pattern mining
  • Not needed for: Basic repository-scoped memory

3.3 Vector Databases (Advanced)

Recommendation: Not needed for v1

Future Consideration:

  • Semantic search across project history
  • Pattern recognition across repositories
  • Requires significant infrastructure investment
  • Wait until: SuperClaude reaches "super-intelligence" level

4. SuperClaude PM Agent Recommendations

4.1 Immediate Implementation (v1)

Architecture:

project-root/
├── .git/                          # Repository boundary
├── .gitignore
│   └── .superclaude/              # Add to gitignore
├── .superclaude/
│   └── memory/
│       ├── session_state.json     # Current session context
│       ├── pm_context.json        # PM Agent PDCA state
│       └── decisions/             # Architectural decision records
│           ├── 2025-10-16_auth.md
│           └── 2025-10-15_db.md
└── docs/
    └── superclaude/               # Human-readable documentation
        ├── patterns/              # Successful patterns
        └── mistakes/              # Error prevention

Detection Logic:

import subprocess
from pathlib import Path

def get_repository_root() -> Path | None:
    """Detect git repository root using git rev-parse."""
    try:
        result = subprocess.run(
            ["git", "rev-parse", "--show-toplevel"],
            capture_output=True,
            text=True,
            timeout=5
        )
        if result.returncode == 0:
            return Path(result.stdout.strip())
    except (subprocess.TimeoutExpired, FileNotFoundError):
        pass
    return None

def get_memory_dir() -> Path:
    """Get repository-scoped memory directory."""
    repo_root = get_repository_root()
    if repo_root:
        memory_dir = repo_root / ".superclaude" / "memory"
        memory_dir.mkdir(parents=True, exist_ok=True)
        return memory_dir
    else:
        # Fallback to global memory if not in git repo
        return Path.home() / ".superclaude" / "memory" / "global"

Session Lifecycle Integration:

# Session Start
def restore_session_context():
    repo_root = get_repository_root()
    if not repo_root:
        return {}  # No repository context

    memory_file = repo_root / ".superclaude" / "memory" / "pm_context.json"
    if memory_file.exists():
        return json.loads(memory_file.read_text())
    return {}

# Session End
def save_session_context(context: dict):
    repo_root = get_repository_root()
    if not repo_root:
        return  # Don't save if not in repository

    memory_file = repo_root / ".superclaude" / "memory" / "pm_context.json"
    memory_file.parent.mkdir(parents=True, exist_ok=True)
    memory_file.write_text(json.dumps(context, indent=2))

4.2 PM Agent Memory Management

PDCA Cycle Integration:

# Plan Phase
write_memory(repo_root / ".superclaude/memory/plan.json", {
    "hypothesis": "...",
    "success_criteria": "...",
    "risks": [...]
})

# Do Phase
write_memory(repo_root / ".superclaude/memory/experiment.json", {
    "trials": [...],
    "errors": [...],
    "solutions": [...]
})

# Check Phase
write_memory(repo_root / ".superclaude/memory/evaluation.json", {
    "outcomes": {...},
    "adherence_check": "...",
    "completion_status": "..."
})

# Act Phase
if success:
    move_to_patterns(repo_root / "docs/superclaude/patterns/pattern-name.md")
else:
    move_to_mistakes(repo_root / "docs/superclaude/mistakes/mistake-YYYY-MM-DD.md")

4.3 Context Isolation Strategy

Problem: User switches from SuperClaude_Framework to airis-mcp-gateway Current Behavior: PM Agent retains SuperClaude context → Noise Desired Behavior: PM Agent detects repository change → Clears context → Loads airis-mcp-gateway context

Implementation:

class RepositoryContextManager:
    def __init__(self):
        self.current_repo = None
        self.context = {}

    def check_repository_change(self):
        """Detect if repository changed since last invocation."""
        new_repo = get_repository_root()

        if new_repo != self.current_repo:
            # Repository changed - clear context
            if self.current_repo:
                self.save_context(self.current_repo)

            self.current_repo = new_repo
            self.context = self.load_context(new_repo) if new_repo else {}

            return True  # Context cleared
        return False  # Same repository

    def load_context(self, repo_root: Path) -> dict:
        """Load repository-specific context."""
        memory_file = repo_root / ".superclaude" / "memory" / "pm_context.json"
        if memory_file.exists():
            return json.loads(memory_file.read_text())
        return {}

    def save_context(self, repo_root: Path):
        """Save current context to repository."""
        if not repo_root:
            return
        memory_file = repo_root / ".superclaude" / "memory" / "pm_context.json"
        memory_file.parent.mkdir(parents=True, exist_ok=True)
        memory_file.write_text(json.dumps(self.context, indent=2))

Usage in PM Agent:

# Session Start Protocol
context_mgr = RepositoryContextManager()
if context_mgr.check_repository_change():
    print(f"📍 Repository: {context_mgr.current_repo.name}")
    print(f"前回: {context_mgr.context.get('last_session', 'No previous session')}")
    print(f"進捗: {context_mgr.context.get('progress', 'Starting fresh')}")

4.4 .gitignore Integration

Add to .gitignore:

# SuperClaude Memory (session-specific, not for version control)
.superclaude/memory/

# Keep architectural decisions (optional - can be versioned)
# !.superclaude/memory/decisions/

Rationale:

  • Session state changes frequently → should not be committed
  • Architectural decisions MAY be versioned (team decision)
  • Prevents accidental secret exposure in memory files

5. Future Enhancements (v2+)

5.1 Cross-Repository Intelligence

When to implement: After PM Agent demonstrates reliable single-repository context

Architecture:

~/.superclaude/
└── global_memory/
    ├── patterns/              # Cross-repo patterns
    │   ├── authentication.json
    │   └── testing.json
    └── repo_index/            # Repository metadata
        ├── SuperClaude_Framework.json
        └── airis-mcp-gateway.json

Smart Context Selection:

def get_relevant_context(current_repo: str) -> dict:
    """Select context based on current repository."""
    # Local context (high priority)
    local = load_local_context(current_repo)

    # Global patterns (low priority, filtered by relevance)
    global_patterns = load_global_patterns()
    relevant = filter_by_similarity(global_patterns, local.get('tech_stack'))

    return merge_contexts(local, relevant, priority="local")

5.2 Vector Database Integration

When to implement: If SuperClaude requires semantic search across 100+ repositories

Use Case:

  • "Find all authentication implementations across my projects"
  • "What error handling patterns have I used successfully?"

Technology: pgvector, Qdrant, or Pinecone

Cost-Benefit: High complexity, only justified for "super-intelligence" tier features


6. Implementation Roadmap

Phase 1: Repository-Scoped File Storage (Immediate)

Timeline: 1-2 weeks Effort: Low

  • Implement get_repository_root() detection
  • Create .superclaude/memory/ directory structure
  • Integrate with PM Agent session lifecycle
  • Add .superclaude/memory/ to .gitignore
  • Test repository change detection

Success Criteria:

  • PM Agent context isolated per repository
  • No noise from other projects
  • Session resumes correctly within same repository

Phase 2: PDCA Memory Integration (Short-term)

Timeline: 2-3 weeks Effort: Medium

  • Integrate Plan/Do/Check/Act with file storage
  • Implement docs/superclaude/patterns/ and docs/superclaude/mistakes/
  • Create ADR (Architectural Decision Records) format
  • Add 7-day cleanup for docs/temp/

Success Criteria:

  • Successful patterns documented automatically
  • Mistakes recorded with prevention checklists
  • Knowledge accumulates within repository

Phase 3: Cross-Repository Patterns (Future)

Timeline: 3-6 months Effort: High

  • Implement global pattern database
  • Smart context filtering by tech stack
  • Pattern similarity scoring
  • Opt-in cross-repo intelligence

Success Criteria:

  • PM Agent learns from past projects
  • Suggests relevant patterns from other repos
  • No performance degradation

7. Comparison Matrix

Feature Local Files Database Vector DB
Performance Fast Medium Slow (network)
Simplicity Simple Complex Very Complex
Setup Time Minutes Hours Days
ACID Transactions No Yes Yes
Query Capabilities Basic SQL Semantic
Offline Support Yes ⚠️ Depends No
Developer UX Excellent Good Fair
Maintenance None Regular Intensive

Recommendation for SuperClaude v1: Local Files (clear winner for repository-scoped memory)


8. Security Considerations

8.1 Sensitive Data Handling

Problem: Memory files may contain secrets, API keys, internal URLs Solution: Automatic redaction + gitignore

import re

SENSITIVE_PATTERNS = [
    r'sk_live_[a-zA-Z0-9]{24,}',  # Stripe keys
    r'eyJ[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*',  # JWT tokens
    r'ghp_[a-zA-Z0-9]{36}',  # GitHub tokens
]

def redact_sensitive_data(text: str) -> str:
    """Remove sensitive data before storing in memory."""
    for pattern in SENSITIVE_PATTERNS:
        text = re.sub(pattern, '[REDACTED]', text)
    return text

8.2 .gitignore Best Practices

Always gitignore:

  • .superclaude/memory/ (session state)
  • .superclaude/temp/ (temporary files)

Optional versioning (team decision):

  • .superclaude/memory/decisions/ (ADRs)
  • docs/superclaude/patterns/ (successful patterns)

9. Conclusion

Key Takeaways

  1. Local File Storage is Optimal: Industry standard for repository-scoped context
  2. Git Detection is Standard: Use git rev-parse --show-toplevel
  3. Start Simple, Evolve Later: Files → Database (if needed) → Vector DB (far future)
  4. Repository Isolation is Critical: Prevents context noise across projects
SuperClaude_Framework/
├── .git/
├── .gitignore (+.superclaude/memory/)
├── .superclaude/
│   └── memory/
│       ├── pm_context.json       # Current session state
│       ├── plan.json             # PDCA Plan phase
│       ├── experiment.json       # PDCA Do phase
│       └── evaluation.json       # PDCA Check phase
└── docs/
    └── superclaude/
        ├── patterns/             # Successful implementations
        │   └── authentication-jwt.md
        └── mistakes/             # Error prevention
            └── mistake-2025-10-16.md

Next Steps:

  1. Implement RepositoryContextManager class
  2. Integrate with PM Agent session lifecycle
  3. Add .superclaude/memory/ to .gitignore
  4. Test with repository switching scenarios
  5. Document for team adoption

Research Confidence: High (based on industry standards from Cursor, GitHub Copilot, and security best practices)

Sources:

  • Cursor IDE memory management architecture
  • GitHub Copilot workspace context documentation
  • Enterprise AI security frameworks
  • Git repository detection patterns
  • Storage performance benchmarks

Last Updated: 2025-10-16 Next Review: After Phase 1 implementation (2-3 weeks)