# Document Sharding Guide Comprehensive guide to BMad Method's document sharding system for managing large planning and architecture documents. ## Table of Contents - [What is Document Sharding?](#what-is-document-sharding) - [When to Use Sharding](#when-to-use-sharding) - [How Sharding Works](#how-sharding-works) - [Using the Shard-Doc Tool](#using-the-shard-doc-tool) - [Workflow Support](#workflow-support) - [Best Practices](#best-practices) - [Examples](#examples) ## What is Document Sharding? Document sharding splits large markdown files into smaller, organized files based on level 2 headings (`## Heading`). This enables: - **Selective Loading** - Workflows load only the sections they need - **Reduced Token Usage** - Massive efficiency gains for large projects - **Better Organization** - Logical section-based file structure - **Maintained Context** - Index file preserves document structure ### Architecture ``` Before Sharding: docs/ └── PRD.md (large 50k token file) After Sharding: docs/ └── prd/ ├── index.md # Table of contents with descriptions ├── overview.md # Section 1 ├── user-requirements.md # Section 2 ├── technical-requirements.md # Section 3 └── ... # Additional sections ``` ## When to Use Sharding ### Ideal Candidates **Large Multi-Epic Projects:** - Very large complex PRDs - Architecture documents with multiple system layers - Epic files with 4+ epics (especially for Phase 4) - UX design specs covering multiple subsystems **Token Thresholds:** - **Consider sharding**: Documents > 20k tokens - **Strongly recommended**: Documents > 40k tokens - **Critical for efficiency**: Documents > 60k tokens ### When NOT to Shard **Small Projects:** - Single epic projects - Level 0-1 projects (tech-spec only) - Documents under 10k tokens - Quick prototypes **Frequently Updated Docs:** - Active work-in-progress documents - Documents updated daily - Documents where whole-file context is essential ## How Sharding Works ### Sharding Process 1. **Tool Execution**: Run `npx @kayvan/markdown-tree-parser source.md destination/` - this is abstracted with the core shard-doc task which is installed as a slash command or manual task rule depending on your tools. 2. **Section Extraction**: Tool splits by level 2 headings 3. **File Creation**: Each section becomes a separate file 4. **Index Generation**: `index.md` created with structure and descriptions ### Workflow Discovery BMad workflows use a **dual discovery system**: 1. **Try whole document first** - Look for `document-name.md` 2. **Check for sharded version** - Look for `document-name/index.md` 3. **Priority rule** - Whole document takes precedence if both exist ### Loading Strategies **Full Load (Phase 1-3 workflows):** ``` If sharded: - Read index.md - Read ALL section files - Treat as single combined document ``` **Selective Load (Phase 4 workflows):** ``` If sharded epics and working on Epic 3: - Read epics/index.md - Load ONLY epics/epic-3.md - Skip all other epic files - 90%+ token savings! ``` ## Using the Shard-Doc Tool ### CLI Command ```bash # Activate bmad-master or analyst agent, then: /shard-doc ``` ### Interactive Process ``` Agent: Which document would you like to shard? User: docs/PRD.md Agent: Default destination: docs/prd/ Accept default? [y/n] User: y Agent: Sharding PRD.md... ✓ Created 12 section files ✓ Generated index.md ✓ Complete! ``` ### What Gets Created **index.md structure:** ```markdown # PRD - Index ## Sections 1. [Overview](./overview.md) - Project vision and objectives 2. [User Requirements](./user-requirements.md) - Feature specifications 3. [Epic 1: Authentication](./epic-1-authentication.md) - User auth system 4. [Epic 2: Dashboard](./epic-2-dashboard.md) - Main dashboard UI ... ``` **Individual section files:** - Named from heading text (kebab-case) - Contains complete section content - Preserves all markdown formatting - Can be read independently ## Workflow Support ### Universal Support **All BMM workflows support both formats:** - ✅ Whole documents - ✅ Sharded documents - ✅ Automatic detection - ✅ Transparent to user ### Workflow-Specific Patterns #### Phase 1-3 (Full Load) Workflows load entire sharded documents: - `product-brief` - Research, brainstorming docs - `prd` - Product brief, research - `gdd` - Game brief, research - `create-ux-design` - PRD, brief, architecture (if available) - `tech-spec` - Brief, research - `architecture` - PRD, UX design (if available) - `create-epics-and-stories` - PRD, architecture - `implementation-readiness` - All planning docs #### Phase 4 (Selective Load) Workflows load only needed sections: **sprint-planning** (Full Load): - Needs ALL epics to build complete status **create-story, code-review** (Selective): ``` Working on Epic 3, Story 2: ✓ Load epics/epic-3.md only ✗ Skip epics/epic-1.md, epic-2.md, epic-4.md, etc. Result: 90%+ token reduction for 10-epic projects! ``` ### Input File Patterns Workflows use standardized patterns: ```yaml input_file_patterns: prd: whole: '{output_folder}/*prd*.md' sharded: '{output_folder}/*prd*/index.md' epics: whole: '{output_folder}/*epic*.md' sharded_index: '{output_folder}/*epic*/index.md' sharded_single: '{output_folder}/*epic*/epic-{{epic_num}}.md' ``` ## Best Practices ### Sharding Strategy **Do:** - ✅ Shard after planning phase complete - ✅ Keep level 2 headings well-organized - ✅ Use descriptive section names - ✅ Shard before Phase 4 implementation - ✅ Keep original file as backup initially **Don't:** - ❌ Shard work-in-progress documents - ❌ Shard small documents (<20k tokens) - ❌ Mix sharded and whole versions - ❌ Manually edit index.md structure ### Naming Conventions **Good Section Names:** ```markdown ## Epic 1: User Authentication ## Technical Requirements ## System Architecture ## UX Design Principles ``` **Poor Section Names:** ```markdown ## Section 1 ## Part A ## Details ## More Info ``` ### File Management **When to Re-shard:** - Significant structural changes to document - Adding/removing major sections - After major refactoring **Updating Sharded Docs:** 1. Edit individual section files directly 2. OR edit original, delete sharded folder, re-shard 3. Don't manually edit index.md ## Examples ### Example 1: Large PRD **Scenario:** 15-epic project, PRD is 45k tokens **Before Sharding:** ``` Every workflow loads entire 45k token PRD Architecture workflow: 45k tokens UX design workflow: 45k tokens ``` **After Sharding:** ```bash /shard-doc Source: docs/PRD.md Destination: docs/prd/ Created: prd/index.md prd/overview.md (3k tokens) prd/functional-requirements.md (8k tokens) prd/non-functional-requirements.md (6k tokens) prd/user-personas.md (4k tokens) ...additional FR/NFR sections ``` **Result:** ``` Architecture workflow: Can load specific sections needed UX design workflow: Can load specific sections needed Significant token reduction for large requirement docs! ``` ### Example 2: Sharding Epics File **Scenario:** 8 epics with detailed stories, 35k tokens total ```bash /shard-doc Source: docs/bmm-epics.md Destination: docs/epics/ Created: epics/index.md epics/epic-1.md epics/epic-2.md ... epics/epic-8.md ``` **Efficiency Gain:** ``` Working on Epic 5 stories: Old: Load all 8 epics (35k tokens) New: Load epic-5.md only (4k tokens) Savings: 88% reduction ``` ### Example 3: Architecture Document **Scenario:** Multi-layer system architecture, 28k tokens ```bash /shard-doc Source: docs/architecture.md Destination: docs/architecture/ Created: architecture/index.md architecture/system-overview.md architecture/frontend-architecture.md architecture/backend-services.md architecture/data-layer.md architecture/infrastructure.md architecture/security-architecture.md ``` **Benefit:** Code-review workflow can reference specific architectural layers without loading entire architecture doc. ## Custom Workflow Integration ### For Workflow Builders When creating custom workflows that load large documents: **1. Add input_file_patterns to workflow.yaml:** ```yaml input_file_patterns: your_document: whole: '{output_folder}/*your-doc*.md' sharded: '{output_folder}/*your-doc*/index.md' ``` **2. Add discovery instructions to instructions.md:** ```markdown ## Document Discovery 1. Search for whole document: _your-doc_.md 2. Check for sharded version: _your-doc_/index.md 3. If sharded: Read index + ALL sections (or specific sections if selective load) 4. Priority: Whole document first ``` **3. Choose loading strategy:** - **Full Load**: Read all sections when sharded - **Selective Load**: Read only relevant sections (requires section identification logic) ### Pattern Templates **Full Load Pattern:** ```xml Search for document: {output_folder}/*doc-name*.md If not found, check for sharded: {output_folder}/*doc-name*/index.md Read index.md to understand structure Read ALL section files listed in index Combine content as single document ``` **Selective Load Pattern (with section ID):** ```xml Determine section needed (e.g., epic_num = 3) Check for sharded version: {output_folder}/*doc-name*/index.md Read ONLY the specific section file needed Skip all other section files ``` ## Troubleshooting ### Common Issues **Both whole and sharded exist:** - Workflows will use whole document (priority rule) - Delete or archive the one you don't want **Index.md out of sync:** - Delete sharded folder - Re-run shard-doc on original **Workflow can't find document:** - Check file naming matches patterns (`*prd*.md`, `*epic*.md`, etc.) - Verify index.md exists in sharded folder - Check output_folder path in config **Sections too granular:** - Combine sections in original document - Use fewer level 2 headings - Re-shard ## Related Documentation - [shard-doc Tool](../src/core/tools/shard-doc.xml) - Tool implementation - [BMM Workflows Guide](../src/modules/bmm/workflows/README.md) - Workflow overview - [Workflow Creation Guide](../src/modules/bmb/workflows/create-workflow/workflow-creation-guide.md) - Custom workflow patterns --- **Document sharding is optional but powerful** - use it when efficiency matters for large projects!