feat: add universal document sharding support with dual-strategy loading

Implement comprehensive document sharding system across all BMM workflows enabling 90%+ token savings for large multi-epic projects through selective loading optimization. ## Document Sharding System ### Core Features - **Universal Support**: All 12 BMM workflows (Phase 1-4) handle both whole and sharded documents - **Dual Loading Strategy**: Full Load (Phase 1-3) vs Selective Load (Phase 4) - **Automatic Discovery**: Workflows detect format transparently (whole → sharded priority) - **Efficiency Optimization**: 90%+ token reduction for 10+ epic projects in Phase 4 ### Implementation Details **Phase 1-3 Workflows (7 workflows) - Full Load Strategy:** - product-brief, prd, gdd, create-ux-design, tech-spec, architecture, solutioning-gate-check - Load entire sharded documents when present - Transparent to user experience - Better organization for large projects **Phase 4 Workflows (5 workflows) - Selective Load Strategy:** - sprint-planning (Full Load exception - needs all epics) - epic-tech-context, create-story, story-context, code-review (Selective Load) - Load ONLY the specific epic needed (e.g., epic-3.md for Epic 3 stories) - Massive efficiency: Skip loading 9 other epics in 10-epic project ### Workflow Enhancements **Added to all workflows:** - `input_file_patterns` in workflow.yaml with wildcard discovery - Document Discovery section in instructions.md - Support for sharded index + section files - Brownfield `docs/index.md` support **Pattern standardization:** ```yaml input_file_patterns: document: whole: "{output_folder}/*doc*.md" sharded: "{output_folder}/*doc*/index.md" sharded_single: "{output_folder}/*doc*/section-{{id}}.md" # Selective load ``` ### Retrospective Workflow Major Overhaul Transformed retrospective into immersive, interactive team experience: **Epic Discovery Priority (Fixed):** - Priority 1: Check sprint-status.yaml for last completed epic - Priority 2: Ask user directly - Priority 3: Scan stories folder (last resort) **New Capabilities:** - Deep story analysis: Extract dev notes, mistakes, review feedback, lessons learned - Previous retro integration: Track action items, verify lessons applied - Significant change detection: Alert when discoveries require epic updates - Intent-based facilitation: Natural conversation vs scripted phrases - Party mode protocol: Clear speaker identification (Name (Role): dialogue) - Team dynamics: Drama, disagreements, diverse perspectives, authentic conflict **Structure:** - 12 whole-number steps (no decimals) - Highly interactive with constant user engagement - Cross-references previous retro for accountability - Synthesizes patterns across all stories - Detects architectural assumption changes ## Documentation **Created:** - `docs/document-sharding-guide.md` - Comprehensive 300+ line guide - What is sharding, when to use it (token thresholds) - How sharding works (discovery system, loading strategies) - Using shard-doc tool - Full Load vs Selective Load patterns - Complete examples and troubleshooting - Custom workflow integration patterns **Updated:** - `README.md` - Added Document Sharding feature section - `docs/index.md` - Added under Advanced Topics → Optimization - `src/modules/bmm/workflows/README.md` - Added sharding section with usage - `src/modules/bmb/workflows/create-workflow/workflow-creation-guide.md` - Added complete implementation patterns for workflow builders **Documentation levels:** 1. Overview (README.md) - Quick feature highlight 2. User guide (BMM workflows README) - Practical usage 3. Reference (document-sharding-guide.md) - Complete details 4. Builder guide (workflow-creation-guide.md) - Implementation patterns ## Efficiency Gains **Example: 10-Epic Project** Before sharding: - epic-tech-context for Epic 3: Load all 10 epics (~50k tokens) - create-story for Epic 3: Load all 10 epics (~50k tokens) - story-context for Epic 3: Load all 10 epics (~50k tokens) After sharding with selective load: - epic-tech-context for Epic 3: Load Epic 3 only (~5k tokens) = 90% reduction - create-story for Epic 3: Load Epic 3 only (~5k tokens) = 90% reduction - story-context for Epic 3: Load Epic 3 only (~5k tokens) = 90% reduction ## Breaking Changes None - fully backward compatible. Workflows work with existing whole documents. ## Files Changed **Workflows Updated (25 files):** - 7 Phase 1-3 workflows: Added full load sharding support - 5 Phase 4 workflows: Added selective load sharding support - 1 retrospective workflow: Complete overhaul with sharding support **Documentation (5 files):** - Created: document-sharding-guide.md - Updated: README.md, docs/index.md, BMM workflows README, BMB workflow-creation-guide - Removed: Old conversion report (obsolete) ## Future Extensibility - BMB workflows now aware of sharding patterns - Custom modules can easily implement sharding support - Standard patterns documented for consistency - No need to explain concept in future development
2025-12-29 16:14:59 +00:00 · 2025-11-02 00:13:33 -05:00
parent f77babcd5e
commit 3d4ea5ffd2
32 changed files with 2397 additions and 437 deletions
--- a/src/modules/bmm/workflows/4-implementation/code-review/instructions.md
+++ b/src/modules/bmm/workflows/4-implementation/code-review/instructions.md
@@ -16,6 +16,35 @@

 <critical>DOCUMENT OUTPUT: Technical review reports. Structured findings with severity levels and action items. User skill level ({user_skill_level}) affects conversation style ONLY, not review content.</critical>

+## 📚 Document Discovery - Selective Epic Loading
+
+**Strategy**: This workflow needs only ONE specific epic and its stories for review context, not all epics. This provides huge efficiency gains when epics are sharded.
+
+**Epic Discovery Process (SELECTIVE OPTIMIZATION):**
+
+1. **Determine which epic** you need (epic_num from story being reviewed - e.g., story "3-2-feature-name" needs Epic 3)
+2. **Check for sharded version**: Look for `epics/index.md`
+3. **If sharded version found**:
+   - Read `index.md` to understand structure
+   - **Load ONLY `epic-{epic_num}.md`** (e.g., `epics/epic-3.md` for Epic 3)
+   - DO NOT load all epic files - only the one needed!
+   - This is the key efficiency optimization for large multi-epic projects
+4. **If whole document found**: Load the complete `epics.md` file and extract the relevant epic
+
+**Other Documents (architecture, ux-design) - Full Load:**
+
+1. **Search for whole document first** - Use fuzzy file matching
+2. **Check for sharded version** - If whole document not found, look for `{doc-name}/index.md`
+3. **If sharded version found**:
+   - Read `index.md` to understand structure
+   - Read ALL section files listed in the index
+   - Treat combined content as single document
+4. **Brownfield projects**: The `document-project` workflow creates `{output_folder}/docs/index.md`
+
+**Priority**: If both whole and sharded versions exist, use the whole document.
+
+**UX-Heavy Projects**: Always check for ux-design documentation as it provides critical context for reviewing UI-focused stories.
+
 <workflow>

  <step n="1" goal="Find story ready for review" tag="sprint-status">
--- a/src/modules/bmm/workflows/4-implementation/code-review/workflow.yaml
+++ b/src/modules/bmm/workflows/4-implementation/code-review/workflow.yaml
@@ -51,6 +51,26 @@ recommended_inputs:
  - tech_spec: "Epic technical specification document (auto-discovered)"
  - story_context_file: "Story context file (.context.xml) (auto-discovered)"

+# Smart input file references - handles both whole docs and sharded docs
+# Priority: Whole document first, then sharded version
+# Strategy: SELECTIVE LOAD - only load the specific epic needed for this story review
+input_file_patterns:
+  architecture:
+    whole: "{output_folder}/*architecture*.md"
+    sharded: "{output_folder}/*architecture*/index.md"
+
+  ux_design:
+    whole: "{output_folder}/*ux*.md"
+    sharded: "{output_folder}/*ux*/index.md"
+
+  epics:
+    whole: "{output_folder}/*epic*.md"
+    sharded_index: "{output_folder}/*epic*/index.md"
+    sharded_single: "{output_folder}/*epic*/epic-{{epic_num}}.md"
+
+  document_project:
+    sharded: "{output_folder}/docs/index.md"
+
 standalone: true

 web_bundle: false
--- a/src/modules/bmm/workflows/4-implementation/create-story/instructions.md
+++ b/src/modules/bmm/workflows/4-implementation/create-story/instructions.md
@@ -7,6 +7,35 @@
 <critical>This workflow creates or updates the next user story from epics/PRD and architecture context, saving to the configured stories directory and optionally invoking Story Context.</critical>
 <critical>DOCUMENT OUTPUT: Concise, technical, actionable story specifications. Use tables/lists for acceptance criteria and tasks.</critical>

+## 📚 Document Discovery - Selective Epic Loading
+
+**Strategy**: This workflow needs only ONE specific epic and its stories, not all epics. This provides huge efficiency gains when epics are sharded.
+
+**Epic Discovery Process (SELECTIVE OPTIMIZATION):**
+
+1. **Determine which epic** you need (epic_num from story context - e.g., story "3-2-feature-name" needs Epic 3)
+2. **Check for sharded version**: Look for `epics/index.md`
+3. **If sharded version found**:
+   - Read `index.md` to understand structure
+   - **Load ONLY `epic-{epic_num}.md`** (e.g., `epics/epic-3.md` for Epic 3)
+   - DO NOT load all epic files - only the one needed!
+   - This is the key efficiency optimization for large multi-epic projects
+4. **If whole document found**: Load the complete `epics.md` file and extract the relevant epic
+
+**Other Documents (prd, architecture, ux-design) - Full Load:**
+
+1. **Search for whole document first** - Use fuzzy file matching
+2. **Check for sharded version** - If whole document not found, look for `{doc-name}/index.md`
+3. **If sharded version found**:
+   - Read `index.md` to understand structure
+   - Read ALL section files listed in the index
+   - Treat combined content as single document
+4. **Brownfield projects**: The `document-project` workflow creates `{output_folder}/docs/index.md`
+
+**Priority**: If both whole and sharded versions exist, use the whole document.
+
+**UX-Heavy Projects**: Always check for ux-design documentation as it provides critical context for UI-focused stories.
+
 <workflow>

  <step n="1" goal="Load config and initialize">
--- a/src/modules/bmm/workflows/4-implementation/create-story/workflow.yaml
+++ b/src/modules/bmm/workflows/4-implementation/create-story/workflow.yaml
@@ -44,6 +44,30 @@ recommended_inputs:
  - prd: "PRD document"
  - architecture: "Architecture (optional)"

+# Smart input file references - handles both whole docs and sharded docs
+# Priority: Whole document first, then sharded version
+# Strategy: SELECTIVE LOAD - only load the specific epic needed for this story
+input_file_patterns:
+  prd:
+    whole: "{output_folder}/*prd*.md"
+    sharded: "{output_folder}/*prd*/index.md"
+
+  architecture:
+    whole: "{output_folder}/*architecture*.md"
+    sharded: "{output_folder}/*architecture*/index.md"
+
+  ux_design:
+    whole: "{output_folder}/*ux*.md"
+    sharded: "{output_folder}/*ux*/index.md"
+
+  epics:
+    whole: "{output_folder}/*epic*.md"
+    sharded_index: "{output_folder}/*epic*/index.md"
+    sharded_single: "{output_folder}/*epic*/epic-{{epic_num}}.md"
+
+  document_project:
+    sharded: "{output_folder}/docs/index.md"
+
 standalone: true

 web_bundle: false
--- a/src/modules/bmm/workflows/4-implementation/epic-tech-context/instructions.md
+++ b/src/modules/bmm/workflows/4-implementation/epic-tech-context/instructions.md
@@ -7,6 +7,35 @@
 <critical>This workflow generates a comprehensive Technical Specification from PRD and Architecture, including detailed design, NFRs, acceptance criteria, and traceability mapping.</critical>
 <critical>If required inputs cannot be auto-discovered HALT with a clear message listing missing documents, allow user to provide them to proceed.</critical>

+## 📚 Document Discovery - Selective Epic Loading
+
+**Strategy**: This workflow needs only ONE specific epic and its stories, not all epics. This provides huge efficiency gains when epics are sharded.
+
+**Epic Discovery Process (SELECTIVE OPTIMIZATION):**
+
+1. **Determine which epic** you need (epic_num from workflow context or user input)
+2. **Check for sharded version**: Look for `epics/index.md`
+3. **If sharded version found**:
+   - Read `index.md` to understand structure
+   - **Load ONLY `epic-{epic_num}.md`** (e.g., `epics/epic-3.md` for Epic 3)
+   - DO NOT load all epic files - only the one needed!
+   - This is the key efficiency optimization for large multi-epic projects
+4. **If whole document found**: Load the complete `epics.md` file and extract the relevant epic
+
+**Other Documents (prd, gdd, architecture, ux-design) - Full Load:**
+
+1. **Search for whole document first** - Use fuzzy file matching
+2. **Check for sharded version** - If whole document not found, look for `{doc-name}/index.md`
+3. **If sharded version found**:
+   - Read `index.md` to understand structure
+   - Read ALL section files listed in the index
+   - Treat combined content as single document
+4. **Brownfield projects**: The `document-project` workflow creates `{output_folder}/docs/index.md`
+
+**Priority**: If both whole and sharded versions exist, use the whole document.
+
+**UX-Heavy Projects**: Always check for ux-design documentation as it provides critical context for UI-focused epics and stories.
+
 <workflow>
  <step n="1" goal="Collect inputs and discover next epic" tag="sprint-status">
    <action>Identify PRD and Architecture documents from recommended_inputs. Attempt to auto-discover at default paths.</action>
--- a/src/modules/bmm/workflows/4-implementation/epic-tech-context/workflow.yaml
+++ b/src/modules/bmm/workflows/4-implementation/epic-tech-context/workflow.yaml
@@ -9,17 +9,43 @@ user_name: "{config_source}:user_name"
 communication_language: "{config_source}:communication_language"
 date: system-generated

-# Inputs expected ( check output_folder or ask user if missing)
+# Inputs expected (check output_folder or ask user if missing)
 recommended_inputs:
  - prd
  - gdd
-  - spec
  - architecture
-  - ux_spec
-  - ux-design
-  - if there is an index.md then read the index.md to find other related docs that could be relevant
+  - ux_design
+  - epics (only the specific epic needed for this tech spec)
  - prior epic tech-specs for model, style and consistency reference

+# Smart input file references - handles both whole docs and sharded docs
+# Priority: Whole document first, then sharded version
+# Strategy: SELECTIVE LOAD - only load the specific epic needed (epic_num from context)
+input_file_patterns:
+  prd:
+    whole: "{output_folder}/*prd*.md"
+    sharded: "{output_folder}/*prd*/index.md"
+
+  gdd:
+    whole: "{output_folder}/*gdd*.md"
+    sharded: "{output_folder}/*gdd*/index.md"
+
+  architecture:
+    whole: "{output_folder}/*architecture*.md"
+    sharded: "{output_folder}/*architecture*/index.md"
+
+  ux_design:
+    whole: "{output_folder}/*ux*.md"
+    sharded: "{output_folder}/*ux*/index.md"
+
+  epics:
+    whole: "{output_folder}/*epic*.md"
+    sharded_index: "{output_folder}/*epic*/index.md"
+    sharded_single: "{output_folder}/*epic*/epic-{{epic_num}}.md"
+
+  document_project:
+    sharded: "{output_folder}/docs/index.md"
+
 # Workflow components
 installed_path: "{project-root}/bmad/bmm/workflows/4-implementation/epic-tech-context"
 template: "{installed_path}/template.md"
--- a/src/modules/bmm/workflows/4-implementation/retrospective/instructions.md
+++ b/src/modules/bmm/workflows/4-implementation/retrospective/instructions.md
--- a/src/modules/bmm/workflows/4-implementation/retrospective/workflow.yaml
+++ b/src/modules/bmm/workflows/4-implementation/retrospective/workflow.yaml
@@ -21,6 +21,34 @@ trigger: "Run AFTER completing an epic"
 required_inputs:
  - agent_manifest: "{project-root}/bmad/_cfg/agent-manifest.csv"

+# Smart input file references - handles both whole docs and sharded docs
+# Priority: Whole document first, then sharded version
+# Strategy: SELECTIVE LOAD - only load the completed epic and relevant retrospectives
+input_file_patterns:
+  epics:
+    whole: "{output_folder}/*epic*.md"
+    sharded_index: "{output_folder}/*epic*/index.md"
+    sharded_single: "{output_folder}/*epic*/epic-{{epic_num}}.md"
+
+  previous_retrospective:
+    pattern: "{output_folder}/retrospectives/epic-{{prev_epic_num}}-retro-*.md"
+
+  architecture:
+    whole: "{output_folder}/*architecture*.md"
+    sharded: "{output_folder}/*architecture*/index.md"
+
+  prd:
+    whole: "{output_folder}/*prd*.md"
+    sharded: "{output_folder}/*prd*/index.md"
+
+  document_project:
+    sharded: "{output_folder}/docs/index.md"
+
+# Required files
+sprint_status_file: "{output_folder}/sprint-status.yaml"
+story_directory: "{config_source}:dev_story_location"
+retrospectives_folder: "{output_folder}/retrospectives"
+
 output_artifacts:
  - retrospective_summary: "Comprehensive review of what went well and what could improve"
  - lessons_learned: "Key insights for future epics"
--- a/src/modules/bmm/workflows/4-implementation/sprint-planning/instructions.md
+++ b/src/modules/bmm/workflows/4-implementation/sprint-planning/instructions.md
@@ -3,6 +3,23 @@
 <critical>The workflow execution engine is governed by: {project-root}/bmad/core/tasks/workflow.xml</critical>
 <critical>You MUST have already loaded and processed: {project-root}/bmad/bmm/workflows/4-implementation/sprint-planning/workflow.yaml</critical>

+## 📚 Document Discovery - Full Epic Loading
+
+**Strategy**: Sprint planning needs ALL epics and stories to build complete status tracking.
+
+**Epic Discovery Process:**
+
+1. **Search for whole document first** - Look for `epics.md`, `bmm-epics.md`, or any `*epic*.md` file
+2. **Check for sharded version** - If whole document not found, look for `epics/index.md`
+3. **If sharded version found**:
+   - Read `index.md` to understand the document structure
+   - Read ALL epic section files listed in the index (e.g., `epic-1.md`, `epic-2.md`, etc.)
+   - Process all epics and their stories from the combined content
+   - This ensures complete sprint status coverage
+4. **Priority**: If both whole and sharded versions exist, use the whole document
+
+**Fuzzy matching**: Be flexible with document names - users may use variations like `epics.md`, `bmm-epics.md`, `user-stories.md`, etc.
+
 <workflow>

 <step n="1" goal="Parse epic files and extract all work items">
--- a/src/modules/bmm/workflows/4-implementation/sprint-planning/workflow.yaml
+++ b/src/modules/bmm/workflows/4-implementation/sprint-planning/workflow.yaml
@@ -33,6 +33,14 @@ variables:
  # Output configuration
  status_file: "{output_folder}/sprint-status.yaml"

+# Smart input file references - handles both whole docs and sharded docs
+# Priority: Whole document first, then sharded version
+# Strategy: FULL LOAD - sprint planning needs ALL epics to build complete status
+input_file_patterns:
+  epics:
+    whole: "{output_folder}/*epic*.md"
+    sharded: "{output_folder}/*epic*/index.md"
+
 # Output configuration
 default_output_file: "{status_file}"

--- a/src/modules/bmm/workflows/4-implementation/story-context/instructions.md
+++ b/src/modules/bmm/workflows/4-implementation/story-context/instructions.md
@@ -11,6 +11,35 @@

 <critical>DOCUMENT OUTPUT: Technical context file (.context.xml). Concise, structured, project-relative paths only.</critical>

+## 📚 Document Discovery - Selective Epic Loading
+
+**Strategy**: This workflow needs only ONE specific epic and its stories, not all epics. This provides huge efficiency gains when epics are sharded.
+
+**Epic Discovery Process (SELECTIVE OPTIMIZATION):**
+
+1. **Determine which epic** you need (epic_num from story key - e.g., story "3-2-feature-name" needs Epic 3)
+2. **Check for sharded version**: Look for `epics/index.md`
+3. **If sharded version found**:
+   - Read `index.md` to understand structure
+   - **Load ONLY `epic-{epic_num}.md`** (e.g., `epics/epic-3.md` for Epic 3)
+   - DO NOT load all epic files - only the one needed!
+   - This is the key efficiency optimization for large multi-epic projects
+4. **If whole document found**: Load the complete `epics.md` file and extract the relevant epic
+
+**Other Documents (prd, architecture, ux-design) - Full Load:**
+
+1. **Search for whole document first** - Use fuzzy file matching
+2. **Check for sharded version** - If whole document not found, look for `{doc-name}/index.md`
+3. **If sharded version found**:
+   - Read `index.md` to understand structure
+   - Read ALL section files listed in the index
+   - Treat combined content as single document
+4. **Brownfield projects**: The `document-project` workflow creates `{output_folder}/docs/index.md`
+
+**Priority**: If both whole and sharded versions exist, use the whole document.
+
+**UX-Heavy Projects**: Always check for ux-design documentation as it provides critical context for UI-focused stories.
+
 <workflow>
  <step n="1" goal="Find drafted story and check for existing context" tag="sprint-status">
    <check if="{{story_path}} is provided">
--- a/src/modules/bmm/workflows/4-implementation/story-context/workflow.yaml
+++ b/src/modules/bmm/workflows/4-implementation/story-context/workflow.yaml
@@ -23,6 +23,30 @@ variables:
  story_path: "" # Optional: Explicit story path. If not provided, finds first story with status "drafted"
  story_dir: "{config_source}:dev_story_location"

+# Smart input file references - handles both whole docs and sharded docs
+# Priority: Whole document first, then sharded version
+# Strategy: SELECTIVE LOAD - only load the specific epic needed for this story
+input_file_patterns:
+  prd:
+    whole: "{output_folder}/*prd*.md"
+    sharded: "{output_folder}/*prd*/index.md"
+
+  architecture:
+    whole: "{output_folder}/*architecture*.md"
+    sharded: "{output_folder}/*architecture*/index.md"
+
+  ux_design:
+    whole: "{output_folder}/*ux*.md"
+    sharded: "{output_folder}/*ux*/index.md"
+
+  epics:
+    whole: "{output_folder}/*epic*.md"
+    sharded_index: "{output_folder}/*epic*/index.md"
+    sharded_single: "{output_folder}/*epic*/epic-{{epic_num}}.md"
+
+  document_project:
+    sharded: "{output_folder}/docs/index.md"
+
 # Output configuration
 # Uses story_key from sprint-status.yaml (e.g., "1-2-user-authentication")
 default_output_file: "{story_dir}/{{story_key}}.context.xml"