Files
SuperClaude/.claude-plugin/commands/pm.md
kazuki 373c313033 feat: PM Agent plugin architecture with confidence check test suite
## Plugin Architecture (Token Efficiency)
- Plugin-based PM Agent (97% token reduction vs slash commands)
- Lazy loading: 50 tokens at install, 1,632 tokens on /pm invocation
- Skills framework: confidence_check skill for hallucination prevention

## Confidence Check Test Suite
- 8 test cases (4 categories × 2 cases each)
- Real data from agiletec commit history
- Precision/Recall evaluation (target: ≥0.9/≥0.85)
- Token overhead measurement (target: <150 tokens)

## Research & Analysis
- PM Agent ROI analysis: Claude 4.5 baseline vs self-improving agents
- Evidence-based decision framework
- Performance benchmarking methodology

## Files Changed
### Plugin Implementation
- .claude-plugin/plugin.json: Plugin manifest
- .claude-plugin/commands/pm.md: PM Agent command
- .claude-plugin/skills/confidence_check.py: Confidence assessment
- .claude-plugin/marketplace.json: Local marketplace config

### Test Suite
- .claude-plugin/tests/confidence_test_cases.json: 8 test cases
- .claude-plugin/tests/run_confidence_tests.py: Evaluation script
- .claude-plugin/tests/EXECUTION_PLAN.md: Next session guide
- .claude-plugin/tests/README.md: Test suite documentation

### Documentation
- TEST_PLUGIN.md: Token efficiency comparison (slash vs plugin)
- docs/research/pm_agent_roi_analysis_2025-10-21.md: ROI analysis

### Code Changes
- src/superclaude/pm_agent/confidence.py: Updated confidence checks
- src/superclaude/pm_agent/token_budget.py: Deleted (replaced by /context)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-21 13:31:28 +09:00

1.8 KiB

name, description, category, complexity, mcp-servers, skill
name description category complexity mcp-servers skill
pm Project Manager Agent - Skills-based zero-footprint orchestration orchestration meta
pm

Activating PM Agent skill...

Loading: ~/.claude/skills/pm/implementation.md

Token Efficiency:

  • Startup overhead: 0 tokens (not loaded until /sc:pm)
  • Skill description: ~100 tokens
  • Full implementation: ~2,500 tokens (loaded on-demand)
  • Savings: 100% at startup, loaded only when needed

Core Capabilities (from skill):

  • 🔍 Pre-implementation confidence check (≥90% required)
  • Post-implementation self-validation
  • 🔄 Reflexion learning from mistakes
  • Parallel investigation and execution
  • 📊 Token-budget-aware operations

Session Start Protocol (auto-executes):

  1. Run git status to check repo state
  2. Check token budget from Claude Code UI
  3. Ready to accept tasks

Confidence Check (before implementation):

  1. Receive task from user
  2. Investigation phase (loop until confident):
    • Read existing code (Glob/Grep/Read)
    • Read official documentation (WebFetch/WebSearch)
    • Reference working OSS implementations (Deep Research)
    • Use Repo index for existing patterns
    • Identify root cause and solution
  3. Self-evaluate confidence:
    • <90%: Continue investigation (back to step 2)
    • ≥90%: Root cause + solution confirmed → Proceed to implementation
  4. Implementation phase (only when ≥90%)

Key principle:

  • Investigation: Loop as much as needed, use parallel searches
  • Implementation: Only when "almost certain" about root cause and fix

Memory Management:

  • No automatic memory loading (zero-footprint)
  • Use /sc:load to explicitly load context from Mindbase MCP (vector search, ~250-550 tokens)
  • Use /sc:save to persist session state to Mindbase MCP

Next?