Port TEA commands into workflows and preload Murat knowledge (#660)

* Port TEA commands into workflows and preload Murat knowledge

* Broke the giant knowledge dump into curated fragments under src/modules/bmm/testarch/knowledge/

* Broke the giant knowledge dump into curated fragments under src/modules/bmm/testarch/knowledge/

* updated the web bunles for tea, and spot updates for analyst and sm

* Replaced the old TEA brief with an indexed knowledge system: the agent now loads topic-specific
  docs from knowledge/ via tea-index.csv, workflows reference those fragments, and risk/level/
  priority guidance lives in the new fragment files

---------

Co-authored-by: Murat Ozcan <murat@mac.lan>
This commit is contained in:
Murat K Ozcan
2025-09-30 15:19:55 -05:00
committed by GitHub
parent 30fb0e67e1
commit df0c3e4bae
51 changed files with 1139 additions and 901 deletions

View File

@@ -18,10 +18,8 @@ last-redoc-date: 2025-09-30
- Architect `*solution-architecture`
2. Confirm `bmad/bmm/config.yaml` defines `project_name`, `output_folder`, `dev_story_location`, and language settings.
3. Ensure a test test framework setup exists; if not, use `*framework` command to create a test framework setup, prior to development.
4. Skim supporting references under `./testarch/`:
- `tea-knowledge.md`
- `test-levels-framework.md`
- `test-priorities-matrix.md`
4. Skim supporting references (knowledge under `testarch/`, command workflows under `workflows/testarch/`).
- `tea-index.csv` + `knowledge/*.md`
## High-Level Cheat Sheets
@@ -125,31 +123,40 @@ last-redoc-date: 2025-09-30
## Command Catalog
| Command | Task File | Primary Outputs | Notes |
| -------------- | -------------------------------- | -------------------------------------------------------------------- | ------------------------------------------------ |
| `*framework` | `testarch/framework.md` | Playwright/Cypress scaffold, `.env.example`, `.nvmrc`, sample specs | Use when no production-ready harness exists |
| `*atdd` | `testarch/atdd.md` | Failing Acceptance-Test Driven Development, implementation checklist | Requires approved story + harness |
| `*automate` | `testarch/automate.md` | Prioritized specs, fixtures, README/script updates, DoD summary | Avoid duplicate coverage (see priority matrix) |
| `*ci` | `testarch/ci.md` | CI workflow, selective test scripts, secrets checklist | Platform-aware (GitHub Actions default) |
| `*test-design` | `testarch/test-design.md` | Combined risk assessment, mitigation plan, and coverage strategy | Handles risk scoring and test design in one pass |
| `*trace` | `testarch/trace-requirements.md` | Coverage matrix, recommendations, gate snippet | Requires access to story/tests repositories |
| `*nfr-assess` | `testarch/nfr-assess.md` | NFR assessment report with actions | Focus on security/performance/reliability |
| `*gate` | `testarch/gate.md` | Gate YAML + summary (PASS/CONCERNS/FAIL/WAIVED) | Deterministic decision rules + rationale |
| Command | Task File | Primary Outputs | Notes |
| -------------- | ------------------------------------------------ | ------------------------------------------------------------------- | ------------------------------------------------ |
| `*framework` | `workflows/testarch/framework/instructions.md` | Playwright/Cypress scaffold, `.env.example`, `.nvmrc`, sample specs | Use when no production-ready harness exists |
| `*atdd` | `workflows/testarch/atdd/instructions.md` | Failing acceptance tests + implementation checklist | Requires approved story + harness |
| `*automate` | `workflows/testarch/automate/instructions.md` | Prioritized specs, fixtures, README/script updates, DoD summary | Avoid duplicate coverage (see priority matrix) |
| `*ci` | `workflows/testarch/ci/instructions.md` | CI workflow, selective test scripts, secrets checklist | Platform-aware (GitHub Actions default) |
| `*test-design` | `workflows/testarch/test-design/instructions.md` | Combined risk assessment, mitigation plan, and coverage strategy | Handles risk scoring and test design in one pass |
| `*trace` | `workflows/testarch/trace/instructions.md` | Coverage matrix, recommendations, gate snippet | Requires access to story/tests repositories |
| `*nfr-assess` | `workflows/testarch/nfr-assess/instructions.md` | NFR assessment report with actions | Focus on security/performance/reliability |
| `*gate` | `workflows/testarch/gate/instructions.md` | Gate YAML + summary (PASS/CONCERNS/FAIL/WAIVED) | Deterministic decision rules + rationale |
<details>
<summary>Command Guidance and Context Loading</summary>
- Each task reads one row from `tea-commands.csv` via `command_key`, expanding pipe-delimited (`|`) values into checklists.
- Keep CSV rows lightweight; place in-depth heuristics in `tea-knowledge.md` and reference via `knowledge_tags`.
- If the CSV grows substantially, consider splitting into scoped registries (e.g., planning vs execution) or upgrading to Markdown tables for humans.
- `tea-knowledge.md` encapsulates Murats philosophy—update both CSV and knowledge file together to avoid drift.
- Each task now carries its own preflight/flow/deliverable guidance inline.
- `tea-index.csv` maps workflow needs to knowledge fragments; keep tags accurate as you add guidance.
- Consider future modularization into orchestrated workflows if additional automation is needed.
- Update the fragment markdown files alongside workflow edits so guidance and outputs stay in sync.
</details>
## Workflow Placement
The TEA stack has three tightly-linked layers:
1. **Agent spec (`agents/tea.md`)** declares the persona, critical actions, and the `run-workflow` entries for every TEA command. Critical actions instruct the agent to load `tea-index.csv` and then fetch only the fragments it needs from `knowledge/` before giving guidance.
2. **Knowledge index (`tea-index.csv`)** catalogues each fragment with tags and file paths. Workflows call out the IDs they need (e.g., `risk-governance`, `fixture-architecture`) so the agent loads targeted guidance instead of a monolithic brief.
3. **Workflows (`workflows/testarch/*`)** contain the task flows and reference `tea-index.csv` in their `<flow>`/`<notes>` sections to request specific fragments. Keeping all workflows in this directory ensures consistent discovery during planning (`*framework`), implementation (`*atdd`, `*automate`, `*trace`), and release (`*nfr-assess`, `*gate`).
This separation lets us expand the knowledge base without touching agent wiring and keeps every command remote-controllable via the standard BMAD workflow runner. As navigation improves, we can add lightweight entrypoints or tags in the index without changing where workflows live.
## Appendix
- **Supporting Knowledge:**
- `tea-knowledge.md` Murats testing philosophy, heuristics, and risk scales.
- `test-levels-framework.md` Decision matrix for unit/integration/E2E selection.
- `test-priorities-matrix.md` Priority (P0P3) criteria and target coverage percentages.
s
- `tea-index.csv` Catalog of knowledge fragments with tags and file paths under `knowledge/` for task-specific loading.
- `knowledge/*.md` Focused summaries (fixtures, network, CI, levels, priorities, etc.) distilled from Murats external resources.
- `test-resources-for-ai-flat.txt` Raw 347KB archive retained for manual deep dives when a fragment needs source validation.

View File

@@ -1,40 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# Acceptance TDD v2.0 (Slim)
```xml
<task id="bmad/bmm/testarch/tdd" name="Acceptance Test Driven Development">
<llm critical="true">
<i>Set command_key="*tdd"</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-commands.csv and parse the row where command equals command_key</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-knowledge.md into context</i>
<i>Use CSV columns preflight, flow_cues, deliverables, halt_rules, notes, knowledge_tags to guide execution</i>
<i>Split pipe-delimited fields into individual checklist items</i>
<i>Map knowledge_tags to sections in the knowledge brief and apply them while writing tests</i>
<i>Keep responses concise and focused on generating the failing acceptance tests plus the implementation checklist</i>
</llm>
<flow>
<step n="1" title="Preflight">
<action>Verify each preflight requirement; gather missing info from user when needed</action>
<action>Abort if halt_rules are triggered</action>
</step>
<step n="2" title="Execute TDD Flow">
<action>Walk through flow_cues sequentially, adapting to story context</action>
<action>Use knowledge brief heuristics to enforce Murat's patterns (one test = one concern, explicit assertions, etc.)</action>
</step>
<step n="3" title="Deliverables">
<action>Produce artifacts described in deliverables</action>
<action>Summarize failing tests and checklist items for the developer</action>
</step>
</flow>
<halt>
<i>Apply halt_rules from the CSV row exactly</i>
</halt>
<notes>
<i>Use the notes column for additional constraints or reminders</i>
</notes>
<output>
<i>Failing acceptance test files + implementation checklist summary</i>
</output>
</task>
```

View File

@@ -1,38 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# Automation Expansion v2.0 (Slim)
```xml
<task id="bmad/bmm/testarch/automate" name="Automation Expansion">
<llm critical="true">
<i>Set command_key="*automate"</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-commands.csv and read the row where command equals command_key</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-knowledge.md for heuristics</i>
<i>Follow CSV columns preflight, flow_cues, deliverables, halt_rules, notes, knowledge_tags</i>
<i>Convert pipe-delimited values into actionable checklists</i>
<i>Apply Murat's opinions from the knowledge brief when filling gaps or refactoring tests</i>
</llm>
<flow>
<step n="1" title="Preflight">
<action>Confirm prerequisites; stop if halt_rules are triggered</action>
</step>
<step n="2" title="Execute Automation Flow">
<action>Walk through flow_cues to analyse existing coverage and add only necessary specs</action>
<action>Use knowledge heuristics (composable helpers, deterministic waits, network boundary) while generating code</action>
</step>
<step n="3" title="Deliverables">
<action>Create or update artifacts listed in deliverables</action>
<action>Summarize coverage deltas and remaining recommendations</action>
</step>
</flow>
<halt>
<i>Apply halt_rules from the CSV row as written</i>
</halt>
<notes>
<i>Reference notes column for additional guardrails</i>
</notes>
<output>
<i>Updated spec files and concise summary of automation changes</i>
</output>
</task>
```

View File

@@ -1,39 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# CI/CD Enablement v2.0 (Slim)
```xml
<task id="bmad/bmm/testarch/ci" name="CI/CD Enablement">
<llm critical="true">
<i>Set command_key="*ci"</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-commands.csv and read the row where command equals command_key</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-knowledge.md to recall CI heuristics</i>
<i>Follow CSV columns preflight, flow_cues, deliverables, halt_rules, notes, knowledge_tags</i>
<i>Split pipe-delimited values into actionable lists</i>
<i>Keep output focused on workflow YAML, scripts, and guidance explicitly requested in deliverables</i>
</llm>
<flow>
<step n="1" title="Preflight">
<action>Confirm prerequisites and required permissions</action>
<action>Stop if halt_rules trigger</action>
</step>
<step n="2" title="Execute CI Flow">
<action>Apply flow_cues to design the pipeline stages</action>
<action>Leverage knowledge brief guidance (cost vs confidence, sharding, artifacts) when making trade-offs</action>
</step>
<step n="3" title="Deliverables">
<action>Create artifacts listed in deliverables (workflow files, scripts, documentation)</action>
<action>Summarize the pipeline, selective testing strategy, and required secrets</action>
</step>
</flow>
<halt>
<i>Use halt_rules from the CSV row verbatim</i>
</halt>
<notes>
<i>Reference notes column for optimization reminders</i>
</notes>
<output>
<i>CI workflow + concise explanation ready for team adoption</i>
</output>
</task>
```

View File

@@ -1,41 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# Test Framework Setup v2.0 (Slim)
```xml
<task id="bmad/bmm/testarch/framework" name="Test Framework Setup">
<llm critical="true">
<i>Set command_key="*framework"</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-commands.csv and parse the row where command equals command_key</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-knowledge.md to internal memory</i>
<i>Use the CSV columns preflight, flow_cues, deliverables, halt_rules, notes, knowledge_tags to guide behaviour</i>
<i>Split pipe-delimited values (|) into individual checklist items</i>
<i>Map knowledge_tags to matching sections in the knowledge brief and apply those heuristics throughout execution</i>
<i>DO NOT expand beyond the guidance unless the user supplies extra context; keep instructions lean and adaptive</i>
</llm>
<flow>
<step n="1" title="Run Preflight Checks">
<action>Evaluate each item in preflight; confirm or collect missing information</action>
<action>If any preflight requirement fails, follow halt_rules and stop</action>
</step>
<step n="2" title="Execute Framework Flow">
<action>Follow flow_cues sequence, adapting to the project's stack</action>
<action>When deciding frameworks or patterns, apply relevant heuristics from tea-knowledge.md via knowledge_tags</action>
<action>Keep generated assets minimal—only what the CSV specifies</action>
</step>
<step n="3" title="Finalize Deliverables">
<action>Create artifacts listed in deliverables</action>
<action>Capture a concise summary for the user explaining what was scaffolded</action>
</step>
</flow>
<halt>
<i>Follow halt_rules from the CSV row verbatim</i>
</halt>
<notes>
<i>Use notes column for additional guardrails while executing</i>
</notes>
<output>
<i>Deliverables and summary specified in the CSV row</i>
</output>
</task>
```

View File

@@ -1,38 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# Quality Gate v2.0 (Slim)
```xml
<task id="bmad/bmm/testarch/tea-gate" name="Quality Gate">
<llm critical="true">
<i>Set command_key="*gate"</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-commands.csv and read the matching row</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-knowledge.md to reinforce risk-model heuristics</i>
<i>Use CSV columns preflight, flow_cues, deliverables, halt_rules, notes, knowledge_tags</i>
<i>Split pipe-delimited values into actionable items</i>
<i>Apply deterministic rules for PASS/CONCERNS/FAIL/WAIVED; capture rationale and approvals</i>
</llm>
<flow>
<step n="1" title="Preflight">
<action>Gather latest assessments and confirm prerequisites; halt per halt_rules if missing</action>
</step>
<step n="2" title="Set Gate Decision">
<action>Follow flow_cues to determine status, residual risk, follow-ups</action>
<action>Use knowledge heuristics to balance cost vs confidence when negotiating waivers</action>
</step>
<step n="3" title="Deliverables">
<action>Update gate YAML specified in deliverables</action>
<action>Summarize decision, rationale, owners, and deadlines</action>
</step>
</flow>
<halt>
<i>Apply halt_rules from the CSV row</i>
</halt>
<notes>
<i>Use notes column for quality bar reminders</i>
</notes>
<output>
<i>Updated gate file with documented decision</i>
</output>
</task>
```

View File

@@ -0,0 +1,9 @@
# CI Pipeline and Burn-In Strategy
- Stage jobs: install/caching once, run `test-changed` for quick feedback, then shard full suites with `fail-fast: false` so evidence isnt lost.
- Re-run changed specs 510x (burn-in) before merging to flush flakes; fail the pipeline on the first inconsistent run.
- Upload artifacts on failure (videos, traces, HAR) and keep retry counts explicit—hidden retries hide instability.
- Use `wait-on` for app startup, enforce time budgets (<10 min per job), and document required secrets alongside workflows.
- Mirror CI scripts locally (`npm run test:ci`, `scripts/burn-in-changed.sh`) so devs reproduce pipeline behaviour exactly.
_Source: Murat CI/CD strategy blog, Playwright/Cypress workflow examples._

View File

@@ -0,0 +1,9 @@
# Component Test-Driven Development Loop
- Start every UI change with a failing component spec (`cy.mount` or RTL `render`); ship only after red → green → refactor passes.
- Recreate providers/stores per spec to prevent state bleed and keep parallel runs deterministic.
- Use factories to exercise prop/state permutations; cover accessibility by asserting against roles, labels, and keyboard flows.
- Keep component specs under ~100 lines: split by intent (rendering, state transitions, error messaging) to preserve clarity.
- Pair component tests with visual debugging (Cypress runner, Storybook, Playwright trace viewer) to accelerate diagnosis.
_Source: CCTDD repository, Murat component testing talks._

View File

@@ -0,0 +1,9 @@
# Contract Testing Essentials (Pact)
- Store consumer contracts beside the integration specs that generate them; version contracts semantically and publish on every CI run.
- Require provider verification before merge; failed verification blocks release and surfaces breaking changes immediately.
- Capture fallback behaviour inside interactions (timeouts, retries, error payloads) so resilience guarantees remain explicit.
- Automate broker housekeeping: tag releases, archive superseded contracts, and expire unused pacts to reduce noise.
- Pair contract suites with API smoke or component tests to validate data mapping and UI rendering in tandem.
_Source: Pact consumer/provider sample repos, Murat contract testing blog._

View File

@@ -0,0 +1,9 @@
# Data Factories and API-First Setup
- Prefer factory functions that accept overrides and return complete objects (`createUser(overrides)`)—never rely on static fixtures.
- Seed state through APIs, tasks, or direct DB helpers before visiting the UI; UI-based setup is for validation only.
- Ensure factories generate parallel-safe identifiers (UUIDs, timestamps) and perform cleanup after each test.
- Centralize factory exports to avoid duplication; version them alongside schema changes to catch drift in reviews.
- When working with shared environments, layer feature toggles or targeted cleanup so factories do not clobber concurrent runs.
_Source: Murat Testing Philosophy, blog posts on functional helpers and API-first testing._

View File

@@ -0,0 +1,9 @@
# Email-Based Authentication Testing
- Use services like Mailosaur or in-house SMTP capture; extract magic links via regex or HTML parsing helpers.
- Preserve browser storage (local/session) when processing links—restore state before visiting the authenticated page.
- Cache email payloads with `cypress-data-session` or equivalent so retries dont exhaust inbox quotas.
- Cover negative cases: expired links, reused links, and multiple requests in rapid succession.
- Ensure the workflow logs the email ID and link for troubleshooting, but scrub PII before committing artifacts.
_Source: Email authentication blog, Murat testing toolkit._

View File

@@ -0,0 +1,9 @@
# Error Handling and Resilience Checks
- Treat expected failures explicitly: intercept network errors and assert UI fallbacks (`error-message` visible, retries triggered).
- In Cypress, use scoped `Cypress.on('uncaught:exception')` to ignore known errors; rethrow anything else so regressions fail.
- In Playwright, hook `page.on('pageerror')` and only swallow the specific, documented error messages.
- Test retry/backoff logic by forcing sequential failures (e.g., 500, timeout, success) and asserting telemetry gets recorded.
- Log captured errors with context (request payload, user/session) but redact secrets to keep artifacts safe for sharing.
_Source: Murat error-handling patterns, Pact resilience guidance._

View File

@@ -0,0 +1,9 @@
# Feature Flag Governance
- Centralize flag definitions in a frozen enum; expose helpers to set, clear, and target specific audiences.
- Test both enabled and disabled states in CI; clean up targeting after each spec to keep shared environments stable.
- For LaunchDarkly-style systems, script API helpers to seed variations instead of mutating via UI.
- Maintain a checklist for new flags: default state, owners, expiry date, telemetry, rollback plan.
- Document flag dependencies in story/PR templates so QA and release reviews know which toggles must flip before launch.
_Source: LaunchDarkly strategy blog, Murat test architecture notes._

View File

@@ -0,0 +1,9 @@
# Fixture Architecture Playbook
- Build helpers as pure functions first, then expose them via Playwright `extend` or Cypress commands so logic stays testable in isolation.
- Compose capabilities with `mergeTests` (Playwright) or layered Cypress commands instead of inheritance; each fixture should solve one concern (auth, api, logs, network).
- Keep HTTP helpers framework agnostic—accept all required params explicitly and return results so unit tests and runtime fixtures can share them.
- Export fixtures through package subpaths (`"./api-request"`, `"./api-request/fixtures"`) to make reuse trivial across suites and projects.
- Treat fixture files as infrastructure: document dependencies, enforce deterministic timeouts, and ban hidden retries that mask flakiness.
_Source: Murat Testing Philosophy, cy-vs-pw comparison, SEON production patterns._

View File

@@ -0,0 +1,9 @@
# Network-First Safeguards
- Register interceptions before any navigation or user action; store the promise and await it immediately after the triggering step.
- Assert on structured responses (status, body schema, headers) instead of generic waits so failures surface with actionable context.
- Capture HAR files or Playwright traces on successful runs—reuse them for deterministic CI playback when upstream services flake.
- Prefer edge mocking: stub at service boundaries, never deep within the stack unless risk analysis demands it.
- Replace implicit waits with deterministic signals like `waitForResponse`, disappearance of spinners, or event hooks.
_Source: Murat Testing Philosophy, Playwright patterns book, blog on network interception._

View File

@@ -0,0 +1,21 @@
# Non-Functional Review Criteria
- **Security**
- PASS: auth/authz, secret handling, and threat mitigations in place.
- CONCERNS: minor gaps with clear owners.
- FAIL: critical exposure or missing controls.
- **Performance**
- PASS: metrics meet targets with profiling evidence.
- CONCERNS: trending toward limits or missing baselines.
- FAIL: breaches SLO/SLA or introduces resource leaks.
- **Reliability**
- PASS: error handling, retries, health checks verified.
- CONCERNS: partial coverage or missing telemetry.
- FAIL: no recovery path or crash scenarios unresolved.
- **Maintainability**
- PASS: clean code, tests, and documentation shipped together.
- CONCERNS: duplication, low coverage, or unclear ownership.
- FAIL: absent tests, tangled implementations, or no observability.
- Default to CONCERNS when targets or evidence are undefined—force the team to clarify before sign-off.
_Source: Murat NFR assessment guidance._

View File

@@ -0,0 +1,9 @@
# Playwright Configuration Guardrails
- Load environment configs via a central map (`envConfigMap`) and fail fast when `TEST_ENV` is missing or unsupported.
- Standardize timeouts: action 15s, navigation 30s, expect 10s, test 60s; expose overrides through fixtures rather than inline literals.
- Emit HTML + JUnit reporters, disable auto-open, and store artifacts under `test-results/` for CI upload.
- Keep `.env.example`, `.nvmrc`, and browser dependencies versioned so local and CI runs stay aligned.
- Use global setup for shared auth tokens or seeding, but prefer per-test fixtures for anything mutable to avoid cross-test leakage.
_Source: Playwright book repo, SEON configuration example._

View File

@@ -0,0 +1,17 @@
# Probability and Impact Scale
- **Probability**
- 1 Unlikely: standard implementation, low uncertainty.
- 2 Possible: edge cases or partial unknowns worth investigation.
- 3 Likely: known issues, new integrations, or high ambiguity.
- **Impact**
- 1 Minor: cosmetic issues or easy workarounds.
- 2 Degraded: partial feature loss or manual workaround required.
- 3 Critical: blockers, data/security/regulatory exposure.
- Multiply probability × impact to derive the risk score.
- 13: document for awareness.
- 45: monitor closely, plan mitigations.
- 68: CONCERNS at the gate until mitigations are implemented.
- 9: automatic gate FAIL until resolved or formally waived.
_Source: Murat risk model summary._

View File

@@ -0,0 +1,14 @@
# Risk Governance and Gatekeeping
- Score risk as probability (13) × impact (13); totals ≥6 demand mitigation before approval, 9 mandates a gate failure.
- Classify risks across TECH, SEC, PERF, DATA, BUS, OPS. Document owners, mitigation plans, and deadlines for any score above 4.
- Trace every acceptance criterion to implemented tests; missing coverage must be resolved or explicitly waived before release.
- Gate decisions:
- **PASS** no critical issues remain and evidence is current.
- **CONCERNS** residual risk exists but has owners, actions, and timelines.
- **FAIL** critical issues unresolved or evidence missing.
- **WAIVED** risk accepted with documented approver, rationale, and expiry.
- Maintain a gate history log capturing updates so auditors can follow the decision trail.
- Use the probability/impact scale fragment for shared definitions when scoring teams run the matrix.
_Source: Murat risk governance notes, gate schema guidance._

View File

@@ -0,0 +1,9 @@
# Selective and Targeted Test Execution
- Use tags/grep (`--grep "@smoke"`, `--grep "@critical"`) to slice suites by risk, not directory.
- Filter by spec patterns (`--spec "**/*checkout*"`) or git diff (`npm run test:changed`) to focus on impacted areas.
- Combine priority metadata (P0P3) with change detection to decide which levels to run pre-commit vs. in CI.
- Record burn-in history for newly added specs; promote to main suite only after consistent green runs.
- Document the selection strategy in README/CI so the team understands when full regression is mandatory.
_Source: 32+ selective testing strategies blog, Murat testing philosophy._

View File

@@ -0,0 +1,10 @@
# Test Quality Definition of Done
- No hard waits (`waitForTimeout`, `cy.wait(ms)`); rely on deterministic waits or event hooks.
- Each spec <300 lines and executes in ≤1.5 minutes.
- Tests are isolated, parallel-safe, and self-cleaning (seed via API/tasks, teardown after run).
- Assertions stay visible in test bodies; avoid conditional logic controlling test flow.
- Suites must pass locally and in CI with the same commands.
- Promote new tests only after they have failed for the intended reason at least once.
_Source: Murat quality checklist._

View File

@@ -0,0 +1,9 @@
# Visual Debugging and Developer Ergonomics
- Keep Playwright trace viewer, Cypress runner, and Storybook accessible in CI artifacts to speed up reproduction.
- Record short screen captures only-on-failure; pair them with HAR or console logs to avoid guesswork.
- Document common trace navigation steps (network tab, action timeline) so new contributors diagnose issues quickly.
- Encourage live-debug sessions with component harnesses to validate behaviour before writing full E2E specs.
- Integrate accessibility tooling (axe, Playwright audits) into the same debug workflow to catch regressions early.
_Source: Murat DX blog posts, Playwright book appendix on debugging._

View File

@@ -1,38 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# NFR Assessment v2.0 (Slim)
```xml
<task id="bmad/bmm/testarch/nfr-assess" name="NFR Assessment">
<llm critical="true">
<i>Set command_key="*nfr-assess"</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-commands.csv and parse the matching row</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-knowledge.md focusing on NFR guidance</i>
<i>Use CSV columns preflight, flow_cues, deliverables, halt_rules, notes, knowledge_tags</i>
<i>Split pipe-delimited values into actionable lists</i>
<i>Demand evidence for each non-functional claim (tests, telemetry, logs)</i>
</llm>
<flow>
<step n="1" title="Preflight">
<action>Confirm prerequisites; halt per halt_rules if unmet</action>
</step>
<step n="2" title="Assess NFRs">
<action>Follow flow_cues to evaluate Security, Performance, Reliability, Maintainability</action>
<action>Use knowledge heuristics to suggest monitoring and fail-fast patterns</action>
</step>
<step n="3" title="Deliverables">
<action>Produce assessment document and recommendations defined in deliverables</action>
<action>Summarize status, gaps, and actions</action>
</step>
</flow>
<halt>
<i>Apply halt_rules from the CSV row</i>
</halt>
<notes>
<i>Reference notes column for negotiation framing (cost vs confidence)</i>
</notes>
<output>
<i>NFR assessment markdown with clear next steps</i>
</output>
</task>
```

View File

@@ -1,9 +0,0 @@
command,title,when_to_use,preflight,flow_cues,deliverables,halt_rules,notes,knowledge_tags
*automate,Automation expansion,After implementation or when reforging coverage,all acceptance criteria satisfied|code builds locally|framework configured,"Review story source/diff to confirm automation target; ensure fixture architecture exists (mergeTests for Playwright, commands for Cypress) and implement apiRequest/network/auth/log fixtures if missing; map acceptance criteria with test-levels-framework.md guidance and avoid duplicate coverage; assign priorities using test-priorities-matrix.md; generate unit/integration/E2E specs with naming convention feature-name.spec.ts, covering happy, negative, and edge paths; enforce deterministic waits, self-cleaning factories, and <=1.5 minute execution per test; run suite and capture Definition of Done results; update package.json scripts and README instructions",New or enhanced spec files grouped by level; fixture modules under support/; data factory utilities; updated package.json scripts and README notes; DoD summary with remaining gaps; gate-ready coverage summary,"If automation target unclear or framework missing, halt and request clarification",Never create page objects; keep tests <300 lines and stateless; forbid hard waits and conditional flow in tests; co-locate tests near source; flag flaky patterns immediately,philosophy/core|patterns/helpers|patterns/waits|patterns/dod
*ci,CI/CD quality pipeline,Once automation suite exists or needs optimization,git repository initialized|tests pass locally|team agrees on target environments|access to CI platform settings,"Detect CI platform (default GitHub Actions, ask if GitLab/CircleCI/etc); scaffold workflow (.github/workflows/test.yml or platform equivalent) with triggers; set Node.js version from .nvmrc and cache node_modules + browsers; stage jobs: lint -> unit -> component -> e2e with matrix parallelization (shard by file not test); add selective execution script for affected tests; create burn-in job that reruns changed specs 3x to catch flakiness; attach artifacts on failure (traces/videos/HAR); configure retries/backoff and concurrency controls; document required secrets and environment variables; add Slack/email notifications and local script mirroring CI",.github/workflows/test.yml (or platform equivalent); scripts/test-changed.sh; scripts/burn-in-changed.sh; updated README/ci.md instructions; secrets checklist; dashboard or badge configuration,"If git repo absent, test framework missing, or CI platform unspecified, halt and request setup",Target 20x speedups via parallel shards + caching; shard by file; keep jobs under 10 minutes; wait-on-timeout 120s for app startup; ensure npm test locally matches CI run; mention alternative platform paths when not on GitHub,philosophy/core|ci-strategy
*framework,Initialize test architecture,Run once per repo or when no production-ready harness exists,package.json present|no existing E2E framework detected|architectural context available,"Identify stack from package.json (React/Vue/Angular/Next.js); detect bundler (Vite/Webpack/Rollup/esbuild); match test language to source (JS/TS frontend -> JS/TS tests); choose Playwright for large or performance-critical repos, Cypress for small DX-first teams; create {framework}/tests/ and {framework}/support/fixtures/ and {framework}/support/helpers/; configure config files with timeouts (action 15s, navigation 30s, test 60s) and reporters (HTML + JUnit); create .env.example with TEST_ENV, BASE_URL, API_URL; implement pure function->fixture->mergeTests pattern and faker-based data factories; enable failure-only screenshots/videos and ensure .nvmrc recorded",playwright/ or cypress/ folder with config + support tree; .env.example; .nvmrc; example tests; README with setup instructions,"If package.json missing OR framework already configured, halt and instruct manual review","Playwright: worker parallelism, trace viewer, multi-language support; Cypress: avoid if many dependent API calls; Component testing: Vitest (large) or Cypress CT (small); Contract testing: Pact for microservices; always use data-cy/data-testid selectors",philosophy/core|patterns/fixtures|patterns/selectors
*gate,Quality gate decision,After review or mitigation updates,latest assessments gathered|team consensus on fixes,"Assemble story metadata (id, title); choose gate status using deterministic rules (PASS all critical issues resolved, CONCERNS minor residual risk, FAIL critical blockers, WAIVED approved by business); update YAML schema with sections: metadata, waiver status, top_issues, risk_summary totals, recommendations (must_fix, monitor), nfr_validation statuses, history; capture rationale, owners, due dates, and summary comment back to story","docs/qa/gates/{story}.yml updated with schema fields (schema, story, story_title, gate, status_reason, reviewer, updated, waiver, top_issues, risk_summary, recommendations, nfr_validation, history); summary message for team","If review incomplete or risk data outdated, halt and request rerun","FAIL whenever unresolved P0 risks/tests or security holes remain; CONCERNS when mitigations planned but residual risk exists; WAIVED requires reason, approver, and expiry; maintain audit trail in history",philosophy/core|risk-model
*nfr-assess,NFR validation,Late development or pre-review for critical stories,implementation deployed locally|non-functional goals defined or discoverable,"Ask which NFRs to assess; default to core four (security, performance, reliability, maintainability); gather thresholds from story/architecture/technical-preferences and mark unknown targets; inspect evidence (tests, telemetry, logs) for each NFR; classify status using deterministic pass/concerns/fail rules and list quick wins; produce gate block and assessment doc with recommended actions",NFR assessment markdown with findings; gate YAML block capturing statuses and notes; checklist of evidence gaps and follow-up owners,"If NFR targets undefined and no guidance available, request definition and halt","Unknown thresholds -> CONCERNS, never guess; ensure each NFR has evidence or call it out; suggest monitoring hooks and fail-fast mechanisms when gaps exist",philosophy/core|nfr
*tdd,Acceptance Test Driven Development,Before implementation when team commits to TDD,story approved with acceptance criteria|dev sandbox ready|framework scaffolding in place,Clarify acceptance criteria and affected systems; pick appropriate test level (E2E/API/Component); write failing acceptance tests using Given-When-Then with network interception first then navigation; create data factories and fixture stubs for required entities; outline mocks/fixtures infrastructure the dev team must supply; generate component tests for critical UI logic; compile implementation checklist mapping each test to source work; share failing tests with dev agent and maintain red -> green -> refactor loop,Failing acceptance test files; component test stubs; fixture/mocks skeleton; implementation checklist with test-to-code mapping; documented data-testid requirements,"If criteria ambiguous or framework missing, halt for clarification",Start red; one assertion per test; use beforeEach for visible setup (no shared state); remind devs to run tests before writing production code; update checklist as each test goes green,philosophy/core|patterns/test-structure
*test-design,Risk and test design planning,"After story approval, before development",story markdown present|acceptance criteria clear|architecture/PRD accessible,"Filter requirements so only genuine risks remain; review PRD/architecture/story for unresolved gaps; classify risks across TECH, SEC, PERF, DATA, BUS, OPS using category definitions; request clarification when evidence missing; score probability (1 unlikely, 2 possible, 3 likely) and impact (1 minor, 2 degraded, 3 critical) then compute totals; highlight risks >=6 and plan mitigations with owners and timelines; break acceptance criteria into atomic scenarios mapped to mitigations; reference test-levels-framework.md to pick unit/integration/E2E/component levels; avoid duplicate coverage, prefer lower levels when possible; assign priorities using test-priorities-matrix.md; outline data/tooling prerequisites and execution order",Risk assessment markdown in docs/qa/assessments; table of category/probability/impact/score; mitigation matrix with owners and due dates; coverage matrix with requirement/level/priority/mitigation; gate YAML snippet summarizing risk totals and scenario counts; recommended execution order,"If story missing or criteria unclear, halt for clarification","Category definitions: TECH=architecture flaws; SEC=missing controls/vulnerabilities; PERF=SLA risk; DATA=loss/corruption; BUS=user/business harm; OPS=deployment/run failures; rely on evidence, not speculation; tie scenarios back to risk mitigations; keep scenarios independent and maintainable",philosophy/core|risk-model|patterns/test-structure
*trace,Requirements traceability,Mid-development checkpoint or before review,tests exist for story|access to source + specs,"Gather acceptance criteria and implemented tests; map each criterion to concrete tests (file + describe/it) using Given-When-Then narrative; classify coverage status as FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY; flag severity based on priority (P0 gaps critical); recommend additional tests or refactors; generate gate YAML coverage summary",Traceability report saved under docs/qa/assessments; coverage matrix with status per criterion; gate YAML snippet for coverage totals and gaps,"If story lacks implemented tests, pause and advise running *tdd or writing tests","Definitions: FULL=all scenarios validated, PARTIAL=some coverage exists, NONE=no validation, UNIT-ONLY=missing higher level, INTEGRATION-ONLY=lacks lower confidence; ensure assertions explicit and avoid duplicate coverage",philosophy/core|patterns/assertions
1 command title when_to_use preflight flow_cues deliverables halt_rules notes knowledge_tags
2 *automate Automation expansion After implementation or when reforging coverage all acceptance criteria satisfied|code builds locally|framework configured Review story source/diff to confirm automation target; ensure fixture architecture exists (mergeTests for Playwright, commands for Cypress) and implement apiRequest/network/auth/log fixtures if missing; map acceptance criteria with test-levels-framework.md guidance and avoid duplicate coverage; assign priorities using test-priorities-matrix.md; generate unit/integration/E2E specs with naming convention feature-name.spec.ts, covering happy, negative, and edge paths; enforce deterministic waits, self-cleaning factories, and <=1.5 minute execution per test; run suite and capture Definition of Done results; update package.json scripts and README instructions New or enhanced spec files grouped by level; fixture modules under support/; data factory utilities; updated package.json scripts and README notes; DoD summary with remaining gaps; gate-ready coverage summary If automation target unclear or framework missing, halt and request clarification Never create page objects; keep tests <300 lines and stateless; forbid hard waits and conditional flow in tests; co-locate tests near source; flag flaky patterns immediately philosophy/core|patterns/helpers|patterns/waits|patterns/dod
3 *ci CI/CD quality pipeline Once automation suite exists or needs optimization git repository initialized|tests pass locally|team agrees on target environments|access to CI platform settings Detect CI platform (default GitHub Actions, ask if GitLab/CircleCI/etc); scaffold workflow (.github/workflows/test.yml or platform equivalent) with triggers; set Node.js version from .nvmrc and cache node_modules + browsers; stage jobs: lint -> unit -> component -> e2e with matrix parallelization (shard by file not test); add selective execution script for affected tests; create burn-in job that reruns changed specs 3x to catch flakiness; attach artifacts on failure (traces/videos/HAR); configure retries/backoff and concurrency controls; document required secrets and environment variables; add Slack/email notifications and local script mirroring CI .github/workflows/test.yml (or platform equivalent); scripts/test-changed.sh; scripts/burn-in-changed.sh; updated README/ci.md instructions; secrets checklist; dashboard or badge configuration If git repo absent, test framework missing, or CI platform unspecified, halt and request setup Target 20x speedups via parallel shards + caching; shard by file; keep jobs under 10 minutes; wait-on-timeout 120s for app startup; ensure npm test locally matches CI run; mention alternative platform paths when not on GitHub philosophy/core|ci-strategy
4 *framework Initialize test architecture Run once per repo or when no production-ready harness exists package.json present|no existing E2E framework detected|architectural context available Identify stack from package.json (React/Vue/Angular/Next.js); detect bundler (Vite/Webpack/Rollup/esbuild); match test language to source (JS/TS frontend -> JS/TS tests); choose Playwright for large or performance-critical repos, Cypress for small DX-first teams; create {framework}/tests/ and {framework}/support/fixtures/ and {framework}/support/helpers/; configure config files with timeouts (action 15s, navigation 30s, test 60s) and reporters (HTML + JUnit); create .env.example with TEST_ENV, BASE_URL, API_URL; implement pure function->fixture->mergeTests pattern and faker-based data factories; enable failure-only screenshots/videos and ensure .nvmrc recorded playwright/ or cypress/ folder with config + support tree; .env.example; .nvmrc; example tests; README with setup instructions If package.json missing OR framework already configured, halt and instruct manual review Playwright: worker parallelism, trace viewer, multi-language support; Cypress: avoid if many dependent API calls; Component testing: Vitest (large) or Cypress CT (small); Contract testing: Pact for microservices; always use data-cy/data-testid selectors philosophy/core|patterns/fixtures|patterns/selectors
5 *gate Quality gate decision After review or mitigation updates latest assessments gathered|team consensus on fixes Assemble story metadata (id, title); choose gate status using deterministic rules (PASS all critical issues resolved, CONCERNS minor residual risk, FAIL critical blockers, WAIVED approved by business); update YAML schema with sections: metadata, waiver status, top_issues, risk_summary totals, recommendations (must_fix, monitor), nfr_validation statuses, history; capture rationale, owners, due dates, and summary comment back to story docs/qa/gates/{story}.yml updated with schema fields (schema, story, story_title, gate, status_reason, reviewer, updated, waiver, top_issues, risk_summary, recommendations, nfr_validation, history); summary message for team If review incomplete or risk data outdated, halt and request rerun FAIL whenever unresolved P0 risks/tests or security holes remain; CONCERNS when mitigations planned but residual risk exists; WAIVED requires reason, approver, and expiry; maintain audit trail in history philosophy/core|risk-model
6 *nfr-assess NFR validation Late development or pre-review for critical stories implementation deployed locally|non-functional goals defined or discoverable Ask which NFRs to assess; default to core four (security, performance, reliability, maintainability); gather thresholds from story/architecture/technical-preferences and mark unknown targets; inspect evidence (tests, telemetry, logs) for each NFR; classify status using deterministic pass/concerns/fail rules and list quick wins; produce gate block and assessment doc with recommended actions NFR assessment markdown with findings; gate YAML block capturing statuses and notes; checklist of evidence gaps and follow-up owners If NFR targets undefined and no guidance available, request definition and halt Unknown thresholds -> CONCERNS, never guess; ensure each NFR has evidence or call it out; suggest monitoring hooks and fail-fast mechanisms when gaps exist philosophy/core|nfr
7 *tdd Acceptance Test Driven Development Before implementation when team commits to TDD story approved with acceptance criteria|dev sandbox ready|framework scaffolding in place Clarify acceptance criteria and affected systems; pick appropriate test level (E2E/API/Component); write failing acceptance tests using Given-When-Then with network interception first then navigation; create data factories and fixture stubs for required entities; outline mocks/fixtures infrastructure the dev team must supply; generate component tests for critical UI logic; compile implementation checklist mapping each test to source work; share failing tests with dev agent and maintain red -> green -> refactor loop Failing acceptance test files; component test stubs; fixture/mocks skeleton; implementation checklist with test-to-code mapping; documented data-testid requirements If criteria ambiguous or framework missing, halt for clarification Start red; one assertion per test; use beforeEach for visible setup (no shared state); remind devs to run tests before writing production code; update checklist as each test goes green philosophy/core|patterns/test-structure
8 *test-design Risk and test design planning After story approval, before development story markdown present|acceptance criteria clear|architecture/PRD accessible Filter requirements so only genuine risks remain; review PRD/architecture/story for unresolved gaps; classify risks across TECH, SEC, PERF, DATA, BUS, OPS using category definitions; request clarification when evidence missing; score probability (1 unlikely, 2 possible, 3 likely) and impact (1 minor, 2 degraded, 3 critical) then compute totals; highlight risks >=6 and plan mitigations with owners and timelines; break acceptance criteria into atomic scenarios mapped to mitigations; reference test-levels-framework.md to pick unit/integration/E2E/component levels; avoid duplicate coverage, prefer lower levels when possible; assign priorities using test-priorities-matrix.md; outline data/tooling prerequisites and execution order Risk assessment markdown in docs/qa/assessments; table of category/probability/impact/score; mitigation matrix with owners and due dates; coverage matrix with requirement/level/priority/mitigation; gate YAML snippet summarizing risk totals and scenario counts; recommended execution order If story missing or criteria unclear, halt for clarification Category definitions: TECH=architecture flaws; SEC=missing controls/vulnerabilities; PERF=SLA risk; DATA=loss/corruption; BUS=user/business harm; OPS=deployment/run failures; rely on evidence, not speculation; tie scenarios back to risk mitigations; keep scenarios independent and maintainable philosophy/core|risk-model|patterns/test-structure
9 *trace Requirements traceability Mid-development checkpoint or before review tests exist for story|access to source + specs Gather acceptance criteria and implemented tests; map each criterion to concrete tests (file + describe/it) using Given-When-Then narrative; classify coverage status as FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY; flag severity based on priority (P0 gaps critical); recommend additional tests or refactors; generate gate YAML coverage summary Traceability report saved under docs/qa/assessments; coverage matrix with status per criterion; gate YAML snippet for coverage totals and gaps If story lacks implemented tests, pause and advise running *tdd or writing tests Definitions: FULL=all scenarios validated, PARTIAL=some coverage exists, NONE=no validation, UNIT-ONLY=missing higher level, INTEGRATION-ONLY=lacks lower confidence; ensure assertions explicit and avoid duplicate coverage philosophy/core|patterns/assertions

View File

@@ -0,0 +1,19 @@
id,name,description,tags,fragment_file
fixture-architecture,Fixture Architecture,"Composable fixture patterns (pure function → fixture → merge) and reuse rules","fixtures,architecture,playwright,cypress",knowledge/fixture-architecture.md
network-first,Network-First Safeguards,"Intercept-before-navigate workflow, HAR capture, deterministic waits, edge mocking","network,stability,playwright,cypress",knowledge/network-first.md
data-factories,Data Factories and API Setup,"Factories with overrides, API seeding, cleanup discipline","data,factories,setup,api",knowledge/data-factories.md
component-tdd,Component TDD Loop,"Red→green→refactor workflow, provider isolation, accessibility assertions","component-testing,tdd,ui",knowledge/component-tdd.md
playwright-config,Playwright Config Guardrails,"Environment switching, timeout standards, artifact outputs","playwright,config,env",knowledge/playwright-config.md
ci-burn-in,CI and Burn-In Strategy,"Staged jobs, shard orchestration, burn-in loops, artifact policy","ci,automation,flakiness",knowledge/ci-burn-in.md
selective-testing,Selective Test Execution,"Tag/grep usage, spec filters, diff-based runs, promotion rules","risk-based,selection,strategy",knowledge/selective-testing.md
feature-flags,Feature Flag Governance,"Enum management, targeting helpers, cleanup, release checklists","feature-flags,governance,launchdarkly",knowledge/feature-flags.md
contract-testing,Contract Testing Essentials,"Pact publishing, provider verification, resilience coverage","contract-testing,pact,api",knowledge/contract-testing.md
email-auth,Email Authentication Testing,"Magic link extraction, state preservation, caching, negative flows","email-authentication,security,workflow",knowledge/email-auth.md
error-handling,Error Handling Checks,"Scoped exception handling, retry validation, telemetry logging","resilience,error-handling,stability",knowledge/error-handling.md
visual-debugging,Visual Debugging Toolkit,"Trace viewer usage, artifact expectations, accessibility integration","debugging,dx,tooling",knowledge/visual-debugging.md
risk-governance,Risk Governance,"Scoring matrix, category ownership, gate decision rules","risk,governance,gates",knowledge/risk-governance.md
probability-impact,Probability and Impact Scale,"Shared definitions for scoring matrix and gate thresholds","risk,scoring,scale",knowledge/probability-impact.md
test-quality,Test Quality Definition of Done,"Execution limits, isolation rules, green criteria","quality,definition-of-done,tests",knowledge/test-quality.md
nfr-criteria,NFR Review Criteria,"Security, performance, reliability, maintainability status definitions","nfr,assessment,quality",knowledge/nfr-criteria.md
test-levels,Test Levels Framework,"Guidelines for choosing unit, integration, or end-to-end coverage","testing,levels,selection",knowledge/test-levels-framework.md
test-priorities,Test Priorities Matrix,"P0P3 criteria, coverage targets, execution ordering","testing,prioritization,risk",knowledge/test-priorities-matrix.md
1 id name description tags fragment_file
2 fixture-architecture Fixture Architecture Composable fixture patterns (pure function → fixture → merge) and reuse rules fixtures,architecture,playwright,cypress knowledge/fixture-architecture.md
3 network-first Network-First Safeguards Intercept-before-navigate workflow, HAR capture, deterministic waits, edge mocking network,stability,playwright,cypress knowledge/network-first.md
4 data-factories Data Factories and API Setup Factories with overrides, API seeding, cleanup discipline data,factories,setup,api knowledge/data-factories.md
5 component-tdd Component TDD Loop Red→green→refactor workflow, provider isolation, accessibility assertions component-testing,tdd,ui knowledge/component-tdd.md
6 playwright-config Playwright Config Guardrails Environment switching, timeout standards, artifact outputs playwright,config,env knowledge/playwright-config.md
7 ci-burn-in CI and Burn-In Strategy Staged jobs, shard orchestration, burn-in loops, artifact policy ci,automation,flakiness knowledge/ci-burn-in.md
8 selective-testing Selective Test Execution Tag/grep usage, spec filters, diff-based runs, promotion rules risk-based,selection,strategy knowledge/selective-testing.md
9 feature-flags Feature Flag Governance Enum management, targeting helpers, cleanup, release checklists feature-flags,governance,launchdarkly knowledge/feature-flags.md
10 contract-testing Contract Testing Essentials Pact publishing, provider verification, resilience coverage contract-testing,pact,api knowledge/contract-testing.md
11 email-auth Email Authentication Testing Magic link extraction, state preservation, caching, negative flows email-authentication,security,workflow knowledge/email-auth.md
12 error-handling Error Handling Checks Scoped exception handling, retry validation, telemetry logging resilience,error-handling,stability knowledge/error-handling.md
13 visual-debugging Visual Debugging Toolkit Trace viewer usage, artifact expectations, accessibility integration debugging,dx,tooling knowledge/visual-debugging.md
14 risk-governance Risk Governance Scoring matrix, category ownership, gate decision rules risk,governance,gates knowledge/risk-governance.md
15 probability-impact Probability and Impact Scale Shared definitions for scoring matrix and gate thresholds risk,scoring,scale knowledge/probability-impact.md
16 test-quality Test Quality Definition of Done Execution limits, isolation rules, green criteria quality,definition-of-done,tests knowledge/test-quality.md
17 nfr-criteria NFR Review Criteria Security, performance, reliability, maintainability status definitions nfr,assessment,quality knowledge/nfr-criteria.md
18 test-levels Test Levels Framework Guidelines for choosing unit, integration, or end-to-end coverage testing,levels,selection knowledge/test-levels-framework.md
19 test-priorities Test Priorities Matrix P0–P3 criteria, coverage targets, execution ordering testing,prioritization,risk knowledge/test-priorities-matrix.md

View File

@@ -1,275 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# Murat Test Architecture Foundations (Slim Brief)
This brief distills Murat Ozcan's testing philosophy used by the Test Architect agent. Use it as the north star after loading `tea-commands.csv`.
## Core Principles
- Cost vs confidence: cost = creation + execution + maintenance. Push confidence where impact is highest and skip redundant checks.
- Engineering assumes failure: predict what breaks, defend with tests, learn from every failure. A single failing test means the software is not ready.
- Quality is team work. Story estimates include testing, documentation, and deployment work required to ship safely.
- Missing test coverage is feature debt (hurts customers), not mere tech debt—treat it with the same urgency as functionality gaps.
- Shared mutable state is the source of all evil: design fixtures and helpers so each test owns its data.
- Composition over inheritance: prefer functional helpers and fixtures that compose behaviour; page objects and deep class trees hide duplication.
- Setup via API, assert via UI. Keep tests user-centric while priming state through fast interfaces.
- One test = one concern. Explicit assertions live in the test body, not buried in helpers.
## Patterns and Heuristics
- Selector order: `data-cy` / `data-testid` -> ARIA -> text. Avoid brittle CSS, IDs, or index based locators.
- Network boundary is the mock boundary. Stub at the edge, never mid-service unless risk demands.
- **Network-first pattern**: ALWAYS intercept before navigation: `const call = interceptNetwork(); await page.goto(); await call;`
- Deterministic waits only: await specific network responses, elements disappearing, or event hooks. Ban fixed sleeps.
- **Fixture architecture (The Murat Way)**:
```typescript
// 1. Pure function first (testable independently)
export async function apiRequest({ request, method, url, data }) {
/* implementation */
}
// 2. Fixture wrapper
export const apiRequestFixture = base.extend({
apiRequest: async ({ request }, use) => {
await use((params) => apiRequest({ request, ...params }));
},
});
// 3. Compose via mergeTests
export const test = mergeTests(base, apiRequestFixture, authFixture, networkFixture);
```
- **Data factories pattern**:
```typescript
export const createUser = (overrides = {}) => ({
id: faker.string.uuid(),
email: faker.internet.email(),
...overrides,
});
```
- Visual debugging: keep component/test runner UIs available (Playwright trace viewer, Cypress runner) to accelerate feedback.
## Risk and Coverage
- Risk score = probability (1-3) × impact (1-3). Score 9 => gate FAIL, ≥6 => CONCERNS. Most stories have 0-1 high risks.
- Test level ratio: heavy unit/component coverage, but always include E2E for critical journeys and integration seams.
- Traceability looks for reality: map each acceptance criterion to concrete tests and flag missing coverage or duplicate value.
- NFR focus areas: Security, Performance, Reliability, Maintainability. Demand evidence (tests, telemetry, alerts) before approving.
## Test Configuration
- **Timeouts**: actionTimeout 15s, navigationTimeout 30s, testTimeout 60s, expectTimeout 10s
- **Reporters**: HTML (never auto-open) + JUnit XML for CI integration
- **Media**: screenshot only-on-failure, video retain-on-failure
- **Language Matching**: Tests should match source code language (JS/TS frontend -> JS/TS tests)
## Automation and CI
- Prefer Playwright for multi-language teams, worker parallelism, rich debugging; Cypress suits smaller DX-first repos or component-heavy spikes.
- **Framework Selection**: Large repo + performance = Playwright, Small repo + DX = Cypress
- **Component Testing**: Large repos = Vitest (has UI, easy RTL conversion), Small repos = Cypress CT
- CI pipelines run lint -> unit -> component -> e2e, with selective reruns for flakes and artifacts (videos, traces) on failure.
- Shard suites to keep feedback tight; treat CI as shared safety net, not a bottleneck.
- Test selection ideas (32+ strategies): filter by tags/grep (`npm run test -- --grep "@smoke"`), file patterns (`--spec "**/*checkout*"`), changed files (`npm run test:changed`), or test level (`npm run test:unit` / `npm run test:e2e`).
- Burn-in testing: run new or changed specs multiple times (e.g., 3-10x) to flush flakes before they land in main.
- Keep helper scripts handy (`scripts/test-changed.sh`, `scripts/burn-in-changed.sh`) so CI and local workflows stay in sync.
## Project Structure and Config
- **Directory structure**:
```
project/
├── playwright.config.ts # Environment-based config loading
├── playwright/
│ ├── tests/ # All specs (group by domain: auth/, network/, feature-flags/…)
│ ├── support/ # Frequently touched helpers (global-setup, merged-fixtures, ui helpers, factories)
│ ├── config/ # Environment configs (base, local, staging, production)
│ └── scripts/ # Expert utilities (burn-in, record/playback, maintenance)
```
- **Environment config pattern**:
```javascript
const configs = {
local: require('./config/local.config'),
staging: require('./config/staging.config'),
prod: require('./config/prod.config'),
};
export default configs[process.env.TEST_ENV || 'local'];
```
## Test Hygiene and Independence
- Tests must be independent and stateless; never rely on execution order.
- Cleanup all data created during tests (afterEach or API cleanup).
- Ensure idempotency: same results every run.
- No shared mutable state; prefer factory functions per test.
- Tests must run in parallel safely; never commit `.only`.
- Prefer co-location: component tests next to components, integration in `tests/integration`, etc.
- Feature flags: centralise enum definitions (e.g., `export const FLAGS = Object.freeze({ NEW_FEATURE: 'new-feature' })`), provide helpers to set/clear targeting, and write dedicated flag tests that clean up targeting after each run.
## CCTDD (Component Test-Driven Development)
- Start with failing component test -> implement minimal component -> refactor.
- Component tests catch ~70% of bugs before integration.
- Use `cy.mount()` or `render()` to test components in isolation; focus on user interactions.
## CI Optimization Strategies
- **Parallel execution**: Split by test file, not test case.
- **Smart selection**: Run only tests affected by changes (dependency graphs, git diff).
- **Burn-in testing**: Run new/modified tests 3x to catch flakiness early.
- **HAR recording**: Record network traffic for offline playback in CI.
- **Selective reruns**: Only rerun failed specs, not entire suite.
- **Network recording**: capture HAR files during stable runs so CI can replay network traffic when external systems are flaky.
## Package Scripts
- **Essential npm scripts**:
```json
"test:e2e": "playwright test",
"test:unit": "vitest run",
"test:component": "cypress run --component",
"test:contract": "jest --testMatch='**/pact/*.spec.ts'",
"test:debug": "playwright test --headed",
"test:ci": "npm run test:unit andand npm run test:e2e",
"contract:publish": "pact-broker publish"
```
## Contract Testing (Pact)
- Use for microservices with integration points.
- Consumer generates contracts, provider verifies.
- Structure: `pact/` directory at root, `pact/config.ts` for broker settings.
- Reference repos: pact-js-example-consumer, pact-js-example-provider, pact-js-example-react-consumer.
## Online Resources and Examples
- Fixture architecture: https://github.com/muratkeremozcan/cy-vs-pw-murats-version
- Playwright patterns: https://github.com/muratkeremozcan/pw-book
- Component testing (CCTDD): https://github.com/muratkeremozcan/cctdd
- Contract testing: https://github.com/muratkeremozcan/pact-js-example-consumer
- Full app example: https://github.com/muratkeremozcan/tour-of-heroes-react-vite-cypress-ts
- Blog posts: https://dev.to/muratkeremozcan
## Risk Model Details
- TECH: Unmitigated architecture flaws, experimental patterns without fallbacks.
- SEC: Missing security controls, potential vulnerabilities, unsafe data handling.
- PERF: SLA-breaking slowdowns, resource exhaustion, lack of caching.
- DATA: Loss or corruption scenarios, migrations without rollback, inconsistent schemas.
- BUS: Business or user harm, revenue-impacting failures, compliance gaps.
- OPS: Deployment, infrastructure, or observability gaps that block releases.
## Probability and Impact Scale
- Probability 1 = Unlikely (standard implementation, low risk).
- Probability 2 = Possible (edge cases, needs attention).
- Probability 3 = Likely (known issues, high uncertainty).
- Impact 1 = Minor (cosmetic, easy workaround).
- Impact 2 = Degraded (partial feature loss, manual workaround needed).
- Impact 3 = Critical (blocker, data/security/regulatory impact).
- Scores: 9 => FAIL, 6-8 => CONCERNS, 4 => monitor, 1-3 => note only.
## Test Design Frameworks
- Use `docs/docs-v6/v6-bmm/test-levels-framework.md` for level selection and anti-patterns.
- Use `docs/docs-v6/v6-bmm/test-priorities-matrix.md` for P0-P3 priority criteria.
- Naming convention: `{epic}.{story}-{LEVEL}-{sequence}` (e.g., `2.4-E2E-01`).
- Tie each scenario to risk mitigations or acceptance criteria.
## Test Quality Definition of Done
- No hard waits (`page.waitForTimeout`, `cy.wait(ms)`)—use deterministic waits.
- Each test < 300 lines and executes in <= 1.5 minutes.
- Tests are stateless, parallel-safe, and self-cleaning.
- No conditional logic in tests (`if/else`, `try/catch` controlling flow).
- Explicit assertions live in tests, not hidden in helpers.
- Tests must run green locally and in CI with identical commands.
- A test delivers value only when it has failed at least once—design suites so they regularly catch regressions during development.
## NFR Status Criteria
- **Security**: PASS (auth, authz, secrets handled), CONCERNS (minor gaps), FAIL (critical exposure).
- **Performance**: PASS (meets targets, profiling evidence), CONCERNS (approaching limits), FAIL (breaches limits, leaks).
- **Reliability**: PASS (error handling, retries, health checks), CONCERNS (partial coverage), FAIL (no recovery, crashes).
- **Maintainability**: PASS (tests + docs + clean code), CONCERNS (duplication, low coverage), FAIL (no tests, tangled code).
- Unknown targets => CONCERNS until defined.
## Quality Gate Schema
```yaml
schema: 1
story: '{epic}.{story}'
story_title: '{title}'
gate: PASS|CONCERNS|FAIL|WAIVED
status_reason: 'Single sentence summary'
reviewer: 'Murat (Master Test Architect)'
updated: '2024-09-20T12:34:56Z'
waiver:
active: false
reason: ''
approved_by: ''
expires: ''
top_issues:
- id: SEC-001
severity: high
finding: 'Issue description'
suggested_action: 'Action to resolve'
risk_summary:
totals:
critical: 0
high: 0
medium: 0
low: 0
recommendations:
must_fix: []
monitor: []
nfr_validation:
security: { status: PASS, notes: '' }
performance: { status: CONCERNS, notes: 'Add caching' }
reliability: { status: PASS, notes: '' }
maintainability: { status: PASS, notes: '' }
history:
- at: '2024-09-20T12:34:56Z'
gate: CONCERNS
note: 'Initial review'
```
- Optional sections: `quality_score` block for extended metrics, and `evidence` block (tests_reviewed, risks_identified, trace.ac_covered/ac_gaps) when teams track them.
## Collaborative TDD Loop
- Share failing acceptance tests with the developer or AI agent.
- Track red -> green -> refactor progress alongside the implementation checklist.
- Update checklist items as each test passes; add new tests for discovered edge cases.
- Keep conversation focused on observable behavior, not implementation detail.
## Traceability Coverage Definitions
- FULL: All scenarios for the criterion validated across appropriate levels.
- PARTIAL: Some coverage exists but gaps remain.
- NONE: No tests currently validate the criterion.
- UNIT-ONLY: Only low-level tests exist; add integration/E2E.
- INTEGRATION-ONLY: Missing unit/component coverage for fast feedback.
- Avoid naive UI E2E until service-level confidence exists; use API or contract tests to harden backends first, then add minimal UI coverage to fill the gaps.
## CI Platform Guidance
- Default to GitHub Actions if no preference is given; otherwise ask for GitLab, CircleCI, etc.
- Ensure local script mirrors CI pipeline (npm test vs CI workflow).
- Use concurrency controls to prevent duplicate runs (`concurrency` block in GitHub Actions).
- Keep job runtime under 10 minutes; split further if necessary.
## Testing Tool Preferences
- Component testing: Large repositories prioritize Vitest with UI (fast, component-native). Smaller DX-first teams with existing Cypress stacks can keep Cypress Component Testing for consistency.
- E2E testing: Favor Playwright for large or performance-sensitive repos; reserve Cypress for smaller DX-first teams where developer experience outweighs scale.
- API testing: Prefer Playwright's API testing or contract suites over ad-hoc REST clients.
- Contract testing: Pact.js for consumer-driven contracts; keep `pact/` config in repo.
- Visual testing: Percy, Chromatic, or Playwright snapshots when UX must be audited.
## Naming Conventions
- File names: `ComponentName.cy.tsx` for Cypress component tests, `component-name.spec.ts` for Playwright, `ComponentName.test.tsx` for unit/RTL.
- Describe blocks: `describe('Feature/Component Name', () => { context('when condition', ...) })`.
- Data attributes: always kebab-case (`data-cy="submit-button"`, `data-testid="user-email"`).
## Reference Materials
If deeper context is needed, consult Murat's testing philosophy notes, blog posts, and sample repositories in https://github.com/muratkeremozcan/test-resources-for-ai/blob/main/gitingest-full-repo-text-version.txt.

View File

@@ -1,43 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# Risk and Test Design v3.0 (Slim)
```xml
<task id="bmad/bmm/testarch/test-design" name="Risk andamp; Test Design">
<llm critical="true">
<i>Set command_key="*test-design"</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-commands.csv and parse the matching row</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-knowledge.md for risk-model and coverage heuristics</i>
<i>Use CSV columns preflight, flow_cues, deliverables, halt_rules, notes, knowledge_tags as the execution blueprint</i>
<i>Split pipe-delimited values into actionable checklists</i>
<i>Stay evidence-based—link risks and scenarios directly to PRD/architecture/story artifacts</i>
</llm>
<flow>
<step n="1" title="Preflight">
<action>Confirm story markdown, acceptance criteria, and architecture/PRD access.</action>
<action>Stop immediately if halt_rules trigger (missing inputs or unclear requirements).</action>
</step>
<step n="2" title="Assess Risks">
<action>Follow flow_cues to filter genuine risks, classify them (TECH/SEC/PERF/DATA/BUS/OPS), and score probability × impact.</action>
<action>Document mitigations with owners, timelines, and residual risk expectations.</action>
</step>
<step n="3" title="Design Coverage">
<action>Break acceptance criteria into atomic scenarios mapped to mitigations.</action>
<action>Choose test levels using test-levels-framework.md, assign priorities via test-priorities-matrix.md, and note tooling/data prerequisites.</action>
</step>
<step n="4" title="Deliverables">
<action>Generate the combined risk report and test design artifacts described in deliverables.</action>
<action>Summarize key risks, mitigations, coverage plan, and recommended execution order.</action>
</step>
</flow>
<halt>
<i>Apply halt_rules from the CSV row verbatim.</i>
</halt>
<notes>
<i>Use notes column for calibration reminders and coverage heuristics.</i>
</notes>
<output>
<i>Unified risk assessment plus coverage strategy ready for implementation.</i>
</output>
</task>
```

View File

@@ -1,38 +0,0 @@
<!-- Powered by BMAD-CORE™ -->
# Requirements Traceability v2.0 (Slim)
```xml
<task id="bmad/bmm/testarch/trace" name="Requirements Traceability">
<llm critical="true">
<i>Set command_key="*trace"</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-commands.csv and read the matching row</i>
<i>Load {project-root}/bmad/bmm/testarch/tea-knowledge.md emphasising assertions guidance</i>
<i>Use CSV columns preflight, flow_cues, deliverables, halt_rules, notes, knowledge_tags</i>
<i>Split pipe-delimited values into actionable lists</i>
<i>Focus on mapping reality: reference actual files, describe coverage gaps, recommend next steps</i>
</llm>
<flow>
<step n="1" title="Preflight">
<action>Validate prerequisites; halt per halt_rules if unmet</action>
</step>
<step n="2" title="Traceability Analysis">
<action>Follow flow_cues to map acceptance criteria to implemented tests</action>
<action>Leverage knowledge heuristics to highlight assertion quality and duplication risks</action>
</step>
<step n="3" title="Deliverables">
<action>Create traceability report described in deliverables</action>
<action>Summarize critical gaps and recommendations</action>
</step>
</flow>
<halt>
<i>Apply halt_rules from the CSV row</i>
</halt>
<notes>
<i>Reference notes column for additional emphasis</i>
</notes>
<output>
<i>Coverage matrix and narrative summary</i>
</output>
</task>
```