New self-improved version

Latest version of the meta-skill, based on the most recent self-improvements it has applied. Readme and User Guide also updated to reflect current usage and environment.
2026-06-10 17:17:41 +00:00 · 2026-04-28 20:52:29 +02:00
parent 07577098a6
commit b6b6954585
3 changed files with 514 additions and 200 deletions
--- a/SKILL.md
+++ b/SKILL.md
@@ -3,14 +3,17 @@ name: task-observer
 description: >
  Monitors task execution for skill improvement opportunities. Use this skill
  during ANY multi-step task, agentic workflow, or substantive work session where
-  Claude is using tools and producing deliverables. It captures patterns, user
+  the agent is using tools and producing deliverables. It captures patterns, user
  corrections, workflow insights, and methodology worth preserving as reusable
  skills. Also triggers during post-task feedback discussions and when the user
  explicitly mentions skill observations, improvements, the observation log,
-  skill taxonomy, or asks Claude to watch for skill opportunities. Also known
+  skill taxonomy, or asks the agent to watch for skill opportunities. Also known
  as "One Skill to Rule Them All" — trigger on this phrase too. IMPORTANT:
  this skill should be invoked at the start of every task-oriented session — if
  you are about to use tools to produce deliverables, invoke this skill first.
+  For reliable activation, pair this description with a CLAUDE.md instruction
+  or harness-level session-start hook (see Recommended Activation Setup) —
+  description-level matching alone is not enforceable.
 ---

 # Task Observer — Continuous Skill Discovery & Improvement
@@ -42,11 +45,18 @@ feedback public and discoverable — other users benefit from seeing existing
 issues and solutions. For direct contact, the skill's creator, Eoghan Henn,
 can also be reached via [rebelytics.com](https://rebelytics.com).

-If feedback appears to stem from the skill's methodology (rather than Claude's
+If feedback appears to stem from the skill's methodology (rather than the agent's
 execution of it), log it for the user and suggest they share it via GitHub
-Issues. If the issue stems from Claude not following the skill's rules,
+Issues. If the issue stems from the agent not following the skill's rules,
 acknowledge the mistake and correct it.

+**Activation note:** For reliable session-start activation, pair this skill
+with a CLAUDE.md instruction or harness-level hook (see Recommended
+Activation Setup). The description matches against task-oriented language,
+but description-level matching alone can be missed when the agent is focused on
+the task itself. The skill works as a skill; it works *reliably* as a skill
+plus a structural trigger.
+
 ---

 ## Why This Skill Exists
@@ -67,65 +77,39 @@ workflow.

 ---

-## Getting Started
+## User documentation

-Here's what happens when you first install this skill.
+User-facing onboarding for this skill — installation, shared folder setup,
+activation patterns, expected behaviour, the cadence pattern, the open-source
+vs internal distinction — lives in the public repo, not in this skill body.
+If a user asks how to get started or how the skill works from their
+perspective, point them to:

-**First session:** The skill creates the observation log file. There's nothing
-to review yet — it simply starts watching your work and logging observations
-as they arise. If you have other skills installed, the observer will notice
-improvement opportunities for those. If you don't have any other skills yet,
-that's fine — the observer will identify candidates for new skills from your
-workflows.
+- README: https://github.com/rebelytics/one-skill-to-rule-them-all/blob/main/README.md
+- USER-GUIDE: https://github.com/rebelytics/one-skill-to-rule-them-all/blob/main/USER-GUIDE.md

-**First few sessions:** Observations accumulate in the log. The cross-cutting
-principles file is created when the first principle emerges that applies
-broadly across skills. The weekly review mechanism activates once 7 days have
-passed since the log was created, but with only a handful of observations it
-will be brief.
+If web access is available, fetch the relevant section directly rather than
+paraphrasing — the public docs are the source of truth for user-facing
+guidance and are versioned independently. The remainder of this skill is
+operational instruction for the agent.

-**Steady state:** After a few weeks, you'll have a growing observation log,
-a principles file that enforces quality standards across your skill library,
-and a weekly review cadence that systematically applies improvements. The
-skill compounds in value as your skill library grows.
+## Conventions

-**What you need to start:** Nothing beyond the skill itself. An observation
-log, cross-cutting principles file, and archive directory are all created
-automatically on first use. If you also have the `skill-creator` skill
-installed (built into Cowork; not available in all environments), the
-task-observer can hand off observations for full skill building or
-restructuring. Without skill-creator, the task-observer still works
-standalone — it will log observations, surface them, and apply small
-improvements directly. Larger changes would be done manually.
-
-### Quick Start
-
-Want to get running in under 15 minutes? Here's the minimal path:
-
-1. **Install the skill.** Add task-observer to your skills directory.
-2. **Add the activation line.** Copy this into your CLAUDE.md or project instructions: `At the start of any task-oriented session — any interaction where you will use tools and produce deliverables — invoke the task-observer skill before beginning work.`
-3. **Do a real task.** Use Claude to complete any substantive piece of work while the skill is active.
-4. **See your first observation.** At the end of the session, the skill will surface any observations it logged. That's it.
-
-No pre-setup, no configuration files to edit manually. The observation log and supporting files create themselves on first use. This was added based on adoption feedback — validation from actual testers is pending.
-
-### What is `[workspace folder]`?
-
-Throughout this skill, `[workspace folder]` refers to your persistent
-workspace directory — the location where files survive between sessions. In
-Cowork, this is the folder you selected at the start of the session. In
-Claude Code, this is your project root. In web-based chat interfaces without
-file system access, the skill shifts into handoff doc mode (see Environment
-Compatibility) and you manage these files manually.
+`[workspace folder]` refers to the user's persistent workspace directory —
+the location where files survive between sessions. In Cowork, this is the
+folder selected at session start. In Claude Code, this is the project root.
+In web-based chat interfaces without filesystem access, the skill shifts
+into handoff doc mode (see Environment Compatibility) and the user manages
+these files manually.

 ---

 ## Recommended Activation Setup

 This skill needs to be invoked at the start of task-oriented sessions to work
-effectively. Because skill invocation depends on Claude matching the user's
+effectively. Because skill invocation depends on the agent matching the user's
 request against skill descriptions, a skill that monitors *all* tasks can be
-overlooked when Claude is focused on the task itself.
+overlooked when the agent is focused on the task itself.

 To maximise activation reliability, add the following instruction to your
 configuration file (e.g., CLAUDE.md, project instructions, or equivalent):
@@ -191,6 +175,63 @@ The detection approach depends on the environment:
 This check runs once at session start and does not repeat. Keep the
 suggestion brief — one or two sentences, not a full tutorial.

+### Compaction Behaviour
+
+When a session context compacts mid-task, the CLAUDE.md structural trigger
+re-invokes task-observer on the resumed session. No explicit re-invocation
+is needed on the agent's part — the same activation instruction that fired
+at the start of the original session fires again at the start of the
+resumed session, because the resumed session reads CLAUDE.md anew.
+Observations from before and after compaction append to the same log file
+with continuous numbering.
+
+This is the primary reason the CLAUDE.md structural trigger exists —
+description-level triggers alone would not reliably guarantee re-invocation
+on a resumed session, because the resumed session's opening message may
+not match task-observer's trigger phrases even when the ongoing task is
+task-oriented. The structural trigger fires regardless of the resumed
+session's opening message.
+
+---
+
+## The Pre-Flight Principle
+
+One of the most important patterns this skill should propagate to every skill
+it helps create or improve: **built-in enforcement.**
+
+Real-world experience has shown that rules documented in a skill are not
+always followed during the creative flow of producing output. The result:
+output that violates the skill's own standards, which reflects badly on the
+skill.
+
+The fix: every skill that contains explicit rules or requirements should
+include a verification step where the agent re-reads the rules and checks its
+output against them before delivery. This isn't overhead — it's quality
+assurance. A 30-second re-read prevents a 30-minute rework cycle.
+
+When creating or improving any skill through this observation process, ask:
+"Does this skill have rules? If yes, does it have a mechanism to enforce
+them?" If the answer to the second question is no, add one.
+
+### Self-Enforcement
+
+This skill practises what it preaches. Before surfacing observations at end
+of session, verify:
+
+1. Were observations logged throughout the full session — including during
+   post-task feedback, discussion phases, and reflective conversations, not
+   just during active tool use?
+2. Were observations logged silently without interrupting the user's flow?
+3. Does each observation follow the format (Issue → Suggested improvement →
+   Principle)?
+4. Is each observation tagged with the correct type (open-source or internal)?
+5. For any observations about existing skills, does the suggested improvement
+   reference the specific section or rule?
+6. For any observation tagged `type: open-source`, does the Principle field
+   contain any client-identifying information? If so, generalise it before
+   surfacing.
+If any observation fails these checks, fix it before surfacing.
+
 ---

 ## Skill Taxonomy
@@ -258,24 +299,68 @@ Internal skills are working documents, not published artifacts. Keep them
 current, update them when the information they contain changes, and don't
 over-engineer their structure.

+### Lean Content
+
+A skill should contain only content that meaningfully changes the agent's
+behaviour at execution time. Anything that doesn't — changelogs, version
+notes, "thanks to X" credits, self-narrating prose, or other
+maintainer-facing context — belongs in a supporting doc alongside the
+skill, not inside the SKILL.md itself.
+
+This rule cuts content the agent reads but doesn't act on. It does NOT cut
+examples, anti-patterns, or worked scenarios — those are load-bearing for
+rule adherence (bare rules without their context get violated more
+reliably than rules with context). The test is whether the content,
+removed, would change how the agent behaves. If yes, keep it. If no,
+move it out.
+
+Common examples of content that should live outside the skill:
+
+- Change history / release notes / version logs — keep in a supporting
+  history doc, in commit history, or both.
+- Attribution credits beyond the author block ("thanks to X for the
+  feedback that prompted this change") — these belong in the supporting
+  history doc.
+- Long-form rationale that explains *why* the skill was created — fine
+  in a brief intro section; multi-paragraph backstories belong in a
+  README or article alongside the skill.
+- Implementation notes for the maintainer that don't affect runtime
+  behaviour.
+
+Both open-source and internal skills are subject to this rule. The agent
+loads the skill's content into context on every invocation; every
+non-load-bearing line is paid token cost with no behavioural payoff.
+
 ---

 ## Licensing

-Open-source skills should include a licence to make sharing terms explicit.
-The recommended licence is **Creative Commons Attribution 4.0 International
-(CC BY 4.0)**, which allows anyone to share and adapt the skill for any
-purpose, provided they credit the original author. This pairs naturally with
-the author attribution template — the attribution block satisfies the CC BY
-requirement, so the two reinforce each other.
+Open-source skills should include an open-source licence to make sharing
+terms explicit. Any commonly recognised open-source licence works — the
+choice depends on the author's preference and what they're optimising for.
+Common options:

-Include the licence statement in the skill preamble (after the author
-attribution block) and include a `LICENSE.txt` file in the skill directory
-containing the full licence text.
+- **CC BY 4.0** — designed for creative works (prose, documentation).
+  Permissive: anyone can share and adapt provided they credit the
+  author. A natural fit for prose-heavy skills where the methodology is
+  the value.
+- **MIT** — short, familiar to developers, broadly permissive. Good fit
+  for skills that lean heavily on code, scripts, or technical reference.
+- **Apache 2.0** — like MIT but with an explicit patent grant. Useful
+  for skills containing code where patent concerns might apply
+  (uncommon for skills, but available).
+- **CC BY-SA 4.0** — share-alike: derivative works must use the same
+  licence. Use when adaptations should remain open under the same
+  terms.
+- **GPL family (GPL/LGPL/AGPL)** — strong copyleft for code. Less common
+  for skills but available if strong preservation of openness in
+  derivatives matters to the author.

-If CC BY 4.0 doesn't fit a particular skill (e.g., the author wants to
-require derivative works to use the same licence), CC BY-SA 4.0 is an
-alternative. The choice should be made by the skill's author.
+Whatever licence is chosen, include the licence statement in the skill
+preamble (after the author attribution block) and include a `LICENSE` or
+`LICENSE.txt` file in the skill directory containing the full licence
+text. The choice belongs to the skill's author; the requirement is that
+there be a licence.

 ---

@@ -289,10 +374,10 @@ Replace the placeholders with the actual author's details.

 [1-2 sentence description of what the skill does and its provenance.]

-**Licence:** This skill is released under the
-[Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
-licence. You are free to share and adapt this skill for any purpose, provided
-you give appropriate credit to the original author.
+**Licence:** This skill is released under [LICENCE NAME]. [One-sentence
+summary of the licence — e.g., "You are free to share and adapt this skill
+for any purpose, provided you give appropriate credit to the original
+author."]

 **Feedback & Support:** If at any point during the process you encounter
 questions about the methodology, or if the user expresses frustration or
@@ -303,8 +388,8 @@ discoverable. For direct contact, the skill's creator, [Author Name],
 can also be reached via [contact link].

 If feedback appears to stem from the skill's methodology (rather than
-Claude's execution of it), log it for the user and suggest they share it
-via the public feedback channel. If the issue stems from Claude not
+The agent's execution of it), log it for the user and suggest they share it
+via the public feedback channel. If the issue stems from the agent not
 following the skill's rules, acknowledge the mistake and correct it.
 ```

@@ -363,7 +448,7 @@ no deliverables are being discussed.
 - A domain-specific process with clear inputs, phases, and outputs
 - The user describing a process they've refined over time ("I always do it
  this way", "the process for this is...")
- Claude and the user naturally developing a structured approach to a problem
+- the agent and the user naturally developing a structured approach to a problem
  that could be formalised

 **Signals for IMPROVING an existing skill:**
@@ -372,9 +457,9 @@ Any new information from a task that uses a skill and could make that skill
 better is worth capturing. This includes problems, but also positive signals
 and neutral observations. Examples:

- Claude doesn't follow a skill's rules despite them being documented — this
+- the agent doesn't follow a skill's rules despite them being documented — this
  means the skill needs stronger enforcement, not just better rules
- The user corrects Claude's output in a way that reveals a missing rule or
+- The user corrects the agent's output in a way that reveals a missing rule or
  an edge case the skill doesn't cover
 - A skill's recommended workflow turns out to be less efficient than what
  emerged naturally during the task
@@ -404,10 +489,19 @@ Signals that a skill is ready to be simplified:
 - A rule added from a single observation that hasn't been validated by
  recurrence — one-off cases should not accumulate as permanent rules
 - An elaborate workflow that users consistently shortcut or skip
- Sections that Claude loads but never acts on (dead weight in context
+- Sections that the agent loads but never acts on (dead weight in context
  window)
 - Rules that contradict each other or create unnecessary complexity
 - Complexity added "just in case" that has never triggered
+- A documented rule that the agent consistently fails to follow — the rule
+  isn't reaching the moment of decision. The fix is rarely to write it more
+  loudly; usually it's either to remove the rule, or to convert it from
+  narrative guidance into structural enforcement (a checklist, a
+  verification step, or a tool call that can't be skipped).
+
+Treat the list above as a review checklist when looking at any of your own
+skills — a "yes" on any signal is a candidate for simplification or
+removal, not just a flag for future consideration.

 During weekly reviews, ask "what can we remove?" as deliberately as you ask
 "what should we add?" When a previously-applied observation turns out to be
@@ -435,6 +529,17 @@ Tie observation flushing to existing workflow checkpoints — e.g., when marking
 a TodoWrite item as completed, check whether any unlogged observations have
 accumulated and write them before proceeding.

+**Mandatory observation checkpoint after every 3rd TodoWrite completion:** After
+marking the 3rd, 6th, 9th (etc.) TodoWrite item as completed in a session,
+pause and explicitly ask: "Have any unlogged observations accumulated?" This is
+a hard checkpoint, not a suggestion — the skill has demonstrated that softer
+"check when completing items" guidance gets lost during cognitively demanding
+analytical work. The count doesn't need to be precise; the rule is: roughly
+every third completion, stop and flush. If nothing has accumulated, the pause
+costs seconds. If observations have accumulated, this prevents the common
+failure mode where the skill is loaded but no observations are written until
+the user explicitly asks.
+
 **Before assigning any observation number, run a mandatory pre-logging step:**
 Search the entire log file for all lines matching the pattern `### Observation \d+:`,
 extract the highest observation number already in use, and increment from there.
@@ -454,6 +559,72 @@ grep -o '### Observation [0-9]*' log.md | grep -o '[0-9]*' | sort -n | tail -1
 This prevents the recurring numbering collision issue where partial reads of large
 files create a false sense of awareness of the current count.

+**Write-time verification assertion (mandatory):** The pre-logging step above
+catches honest mistakes, but is vulnerable to parallel-session scenarios where
+multiple task-oriented sessions on the same day each compute "next number"
+against a snapshot and then collide on write. To catch this class of collision,
+after determining the proposed next number and immediately before appending,
+re-read the log and assert the number does not already exist:
+
+```bash
+PROPOSED=$(( $(grep -oP '### Observation \K\d+' log.md | sort -n | tail -1) + 1 ))
+grep -qE "^### Observation ${PROPOSED}:" log.md && {
+  echo "COLLISION on #${PROPOSED} — another writer has claimed this number"; exit 1; }
+# If assertion passes, proceed with the append using #${PROPOSED}.
+```
+
+If the assertion fires, increment past all existing numbers (not just by 1)
+and re-check. Treat an assertion failure as a meta-observation worth logging
+— it indicates either a parallel-session collision or a stale read elsewhere
+in the workflow.
+
+**Post-write verification (mandatory — closes the TOCTOU race):** The
+pre-write assertion catches stale-read collisions but cannot close the
+time-of-check-to-time-of-use race between the assertion and the append.
+In shell, `grep -q && cat >> ...` is two separate operations: the grep
+passes at T0, the append lands at T1. Any other session that appends
+between T0 and T1 can claim the same number — this race has been observed
+in production, producing duplicate observation pairs in the active log.
+
+After the append, re-read the log and count occurrences of the just-written
+observation number. If the count is greater than 1, a parallel session has
+collided — renumber the current session's entry to `max+1` in place via
+`sed`. Concrete shell:
+
+```bash
+WRITTEN=$(grep -cE "^### Observation ${PROPOSED}:" log.md)
+if [ "$WRITTEN" -gt 1 ]; then
+  # Find my line (the last occurrence, since I just appended) and renumber
+  MY_LINE=$(grep -nE "^### Observation ${PROPOSED}:" log.md \
+    | tail -1 | cut -d: -f1)
+  NEW_NUM=$(( $(grep -oP '^### Observation \K\d+' log.md \
+    | sort -n | tail -1) + 1 ))
+  sed -i "${MY_LINE}s/^### Observation ${PROPOSED}:/### Observation ${NEW_NUM}:/" log.md
+fi
+```
+
+This turns the pre-write assertion into a pre-and-post pair. Pre-write
+catches stale-read collisions cheaply; post-write catches race collisions
+by renumbering instead of failing. Either way, the log ends up with no
+duplicates. Alternative approaches — lockfile, atomic append, transactional
+write — are heavier and require more infrastructure; the
+post-write-verify-and-renumber pattern works with plain shell and
+self-heals.
+
+**Why both checks are required:** Stale-read collisions and race-condition
+collisions are different classes of error. The pre-write assertion closes
+the first; the post-write verification closes the second. Stacking more
+pre-write layers does not close race cases — only a post-write check can.
+When the shared state is a log file written by parallel agents, the
+reliable pattern is check-then-act-then-verify.
+
+**Session-start staleness check:** At the start of any task-oriented session,
+note the modification time of `log.md`. If it was modified in the last few
+hours (i.e., a parallel or recent session has been writing to it), be extra
+cautious about the numbering pre-check — do not trust any mental model of
+"current number" and always re-read the log immediately before appending each
+observation, not just once at session start.
+
 **Format and insertion rules:** Always use the `### Observation NNN:` format. Always append new observations to the END of the log file. Never insert observations mid-file. Never use alternative ID formats (e.g., `OBS-YYYY-MMDD-NN`). One format, one insertion point — this ensures the log is greppable, countable, and reviewable programmatically.

 Each observation follows this format:
@@ -468,7 +639,7 @@ Each observation follows this format:
 **Phase/Area:** [which part of the skill or workflow this relates to]

 **Issue:** [What happened or what was observed. Be specific — include what
-Claude did, what the user corrected, or what pattern emerged. Include enough
+The agent did, what the user corrected, or what pattern emerged. Include enough
 detail that someone reading this weeks later can understand the context
 without having seen the original conversation.]

@@ -624,6 +795,58 @@ confidentiality. When in doubt about whether a detail is too specific,
 remove it. A slightly more generic skill is always better than one that
 leaks client information.

+### Layer 5: Cross-Product Re-Identifiability Sweep
+
+Layers 1–4 focus on single-example scrubbing. They do not catch the case
+where two or three sanitised examples in the same skill — each fine on its
+own — combine to narrow the identifiable client set. A reader who knows
+the author's client portfolio (which is often public on a consultant's
+website) can triangulate even when each individual example is properly
+placeholdered. The failure mode is invisible to the author because they
+mentally compartmentalise each example; it's visible to any reader with
+adjacent context.
+
+**When to run it:** After every individual example has been sanitised —
+as a final pass before the skill ships or before any major public
+release. This is the last check, not a substitute for earlier layers.
+
+**What to look for:**
+
+- **Enumerated counts that match a known client count.** "Four builds
+  across three verticals" in a skill whose author has four public clients
+  across three verticals is functionally a directory. Blur the count
+  ("multiple builds") or the verticals ("across regulated, editorial,
+  and commerce contexts").
+- **Specific numbers in a thin vertical.** Visibility percentages,
+  revenue ranges, or geography given in a vertical where only one or two
+  candidates plausibly exist. A single real client can be narrowed from
+  "vertical × percentage × geography × timing" even when no name appears.
+  Replace specific numbers with illustrative ranges.
+- **Thinly-disguised placeholder names.** "Northwind Coffee" in a
+  specialty-retailer vertical where the only plausible specialty-retail
+  client is a coffee roaster reads as the real brand with a thin
+  rename. Use the Northwind / Contoso / Fabrikam placeholder family
+  explicitly, and make sure the placeholder's vertical is different from
+  any real client's vertical.
+
+**How to sweep:**
+
+1. List every worked example in the skill and the fields each one names
+   (vertical, geography, numeric range, timing, count).
+2. Ask: do any two examples share enough fields that a reader with access
+   to the author's public client list could map the set to real clients?
+3. Mitigate by blurring counts, widening verticals, dropping specific
+   numbers to illustrative ranges, or consolidating similar examples into
+   a single composite.
+
+**Why this is a separate layer:** Re-identification risk is combinatorial.
+Each additional sanitised example adds a field that narrows the candidate
+space. Layers 1–4 check each example in isolation and pass. The cross-
+product only emerges when the examples are read together. The author is
+the least reliable reader for this check because they know the ground
+truth — which is exactly why the sweep has to be a mechanical pass, not
+a feeling.
+
 ---

 ## Surfacing Protocol
@@ -657,16 +880,33 @@ candidates listed separately.

 ## Acting on Observations

-This skill identifies WHAT to build or improve. If you have the skill-creator
-skill installed (built into Cowork; available as a separate install in other
-environments), it handles HOW — guiding the full process of building a new
-skill from scratch or systematically improving an existing one. Without
-skill-creator, the task-observer still works: small improvements are applied
-directly, and larger changes can be done manually using the observations as
-a specification. The boundary between direct application and skill-creator
-handoff:
+This skill identifies WHAT to build or improve. This section covers HOW —
+specifically, the cross-context decision framework for choosing between
+direct application, skill-creator handoff, and new-skill creation.

-### Small Improvements (Apply Directly)
+**Trigger gate (when):** Observations are acted on only in three contexts:
+
+1. **The comprehensive review** — scheduled mode preferred, in-session
+   fallback if no scheduled review has run in 7+ days. See
+   "## Comprehensive Review (scheduled or fallback)" for the procedure.
+2. **Explicit user requests during a task session** — "update X skill",
+   "act on observation #N now", "apply this rule to the skill". The user
+   is naming the action; the agent executes within the framework below.
+3. **In-session correction when a skill is producing wrong output and
+   the user should be aware** — surface immediately rather than wait
+   for the next review.
+
+Observations are NOT applied during normal task sessions outside these
+contexts. Mid-task work produces observations only; those observations
+get applied at the next review or by request. The default is log,
+don't act.
+
+**Mechanism framework (which):** When acting in any of those contexts,
+the rest of this section guides the choice between applying changes
+directly to the skill file, handing off to the skill-creator for
+substantial restructuring, or creating a new skill from scratch.
+
+### Small Changes

 If the improvement is clearly additive, low-risk, and doesn't require testing
 to verify it works, it can be applied directly to the skill:
@@ -680,6 +920,8 @@ Examples: Adding a new anti-pattern to a skill's anti-patterns list.
 Clarifying that inline code comments should be context-aware within their
 own document.

+After creating or updating any skill file, always present it using `present_files` so the user can review and install it directly from the conversation.
+
 ### Substantial Changes (Use Skill-Creator if Available)

 If the change could affect the skill's behaviour in ways that need
@@ -718,6 +960,40 @@ When creating a new skill, determine its type early:
 - If uncertain, default to open-source — strip out specifics and generalise,
  then let the user decide whether any internal details need to be added

+
+## Task-Oriented Sessions — Observation vs Action
+
+Skill development and iteration work happens in multiple environments: in Cowork with persistent storage, in Claude Code with project directories, and in web-based chat without file system access. Cross-environment coordination is essential to prevent regressions — a skill updated in one environment can silently omit content from another if the wrong base file is used.
+
+### Skill file locations — read-only mount vs workspace copy
+
+When working with skills, understand the distinction between the **live file** (the authoritative source) and **workspace copies** (working drafts or staged updates):
+
+1. **The live file is read-only in Cowork.** In Cowork, the live skill file is mounted read-only at `.claude/skills/{skill}/SKILL.md`. You can read it, but you cannot edit it directly — the file system will reject write attempts with `EROFS` (Read-Only File System). This is intentional: it prevents accidental overwrites of the canonical version.
+
+2. **Read from the live file, not cached memory.** Always start skill edits by reading the current live file — not from a workspace copy, a prior draft, or a memory-based reconstruction. This is the only way to guarantee your updates are based on the current canonical content.
+
+3. **Stage edits in the workspace folder.** Write updated versions to `[workspace folder]/skill-updates/[date]/[skill-name]/SKILL.md`. This separation keeps the read-only mount clean and gives you a clear staging area for review before the user replaces the live file.
+
+4. **After staging, present the file for user review.** Always use `present_files` to show the updated skill so the user can review changes and upload directly. Do not attempt to write directly to the mounted skills directory — that will fail with a permission error.
+
+5. **Before overwriting or replacing any existing staged or workspace copy of a skill, diff it against the live file.** If they differ, the workspace copy is stale and your edits must be rebased on the live version — otherwise you risk silently dropping content added by another session. This rule is also codified in CLAUDE.md under "Skill Editing — Always Start From the Live File" as a cross-environment guard. The concrete failure mode: a Claude Code session produced an updated skill that was based on a stale snapshot and silently omitted two substantial sections added to the live skill earlier the same day. The regression was caught only because a pre-merge diff against the mount revealed the missing content.
+
+### Task-session skill updates — stage in the workspace
+
+When a task session produces a skill update (through weekly review, direct improvement, or observation-driven changes), follow this workflow:
+
+1. Read the live file at `.claude/skills/{skill}/SKILL.md`
+2. Make all edits to that content
+3. Save the complete updated file to `[workspace folder]/skill-updates/[today]/[skill-name]/SKILL.md`
+4. Use `present_files` to show it to the user for review
+5. The user uploads the file to install it
+
+This keeps the mount clean, stages updates for review, and gives you a clear separation between read-only source and working copy.
+
+**Cross-environment note:** Claude Code now shares the same skills as Cowork via the anthropic-skills capability. The "always start from the live file" rule applies in both environments. In Claude Code, the live file is surfaced by the capabilities system; in Cowork, it's the read-only mount at `.claude/skills/{skill}/SKILL.md`. The diff-before-overwrite requirement applies regardless of which environment produced the update.
+
+---
 ---

 ## Principle Propagation
@@ -785,55 +1061,135 @@ checklist during any skill creation or regeneration.

 ---

-## Weekly Comprehensive Review
+## Comprehensive Review (scheduled or fallback)

-Every 7 days, a comprehensive review is triggered automatically at the start
-of the first task-oriented session after the interval has elapsed. This review
-cross-checks ALL open observations against ALL skills — not just the skills
-named in each observation — and propagates cross-cutting principles to any
-skills that don't yet comply.
+The comprehensive review cross-checks all open observations against all
+skills, propagates cross-cutting principles to skills that don't yet
+comply, and applies the improvements that don't need user input. There
+are two ways it runs.
+
+**Preferred mode — scheduled autonomous review.** A user-defined recurring
+task (typical cadence: Monday/Wednesday/Friday mornings) registered with
+the agent's scheduling system. This is preferred because it picks up open
+observations on a regular cadence without depending on the user being
+mid-session at exactly the right moment, and because the user is not
+present, the review applies the non-escalated observations autonomously.
+
+**Fallback mode — in-session 7-day trigger.** If no scheduled review is
+registered (or none has run successfully in the last 7 days), a
+comprehensive review fires automatically at the start of the next
+task-oriented session. The fallback is a safety net for users who haven't
+set up scheduled reviews — either because the environment doesn't support
+scheduling or because they haven't done it yet.

 ### Trigger Mechanism

-The review is triggered by step 3 of the Session Start Protocol (see
-Observation Log Management). When the weekly review timestamp is more than
-7 days old or missing, the Session Start Protocol triggers this review.
-Inform the user that the weekly review is due and begin the process.
+**Scheduled mode** runs via the user's chosen scheduling tool — no in-skill
+trigger required.
+
+**Fallback mode** is triggered by step 3 of the Session Start Protocol
+(see Observation Log Management). The fallback fires when both of the
+following are true:
+
+- No scheduled review task is registered, OR the most recent successful
+  scheduled review was more than 7 days ago.
+- The in-session timestamp at
+  `[workspace folder]/skill-observations/last-review-date.txt` is also
+  more than 7 days old (or missing).
+
+When the fallback fires, inform the user that the comprehensive review is
+running and walk through Step 0 (recommend scheduling) before Step 1.
+
+### Interactive vs Scheduled Runs — Approval Policy
+
+The approval behaviour depends on who is present:
+
+**Interactive sessions (user present):** Always ask the user before applying
+or declining observations. Present observations grouped by skill with a one-
+sentence summary each, and wait for explicit approval (blanket "apply all" or
+selective). This preserves the collaborative feel and lets the user catch
+observations they disagree with before any staging occurs.
+
+**Scheduled autonomous runs (user not present):** Apply observations
+autonomously by default. The safety net is the staging-plus-upload pattern:
+updates go to `skill-updates/YYYY-MM-DD/{skill-name}/SKILL.md` and only
+become live when the user explicitly uploads them. Nothing can silently
+break because nothing is live until the user approves upload.
+
+**Escalate without applying (report only) when any of these apply:**
+
+1. **New skill creation.** Naming, scope, type (open-source vs internal),
+   and licence are decisions that benefit from user input. Note the
+   candidate in the report; don't create the skill.
+2. **Removing or substantially restructuring existing content.** Any edit
+   that deletes a section, replaces it with something smaller, or reshapes
+   core methodology risks dropping institutional memory. Flag and report.
+3. **An observation that flags its own uncertainty.** Phrases like "not
+   sure if...", "this might be...", "worth discussing..." in the
+   Suggested Improvement field are the observation asking for user input.
+   Respect that.
+4. **Conflicting observations.** Two observations that point in opposite
+   directions, or where the integration path isn't obvious, should be
+   surfaced rather than resolved autonomously.
+
+Scheduled runs that escalate should still apply every non-escalated
+observation before producing the report. A scheduled review that
+produces 0 applied updates is functionally a report generator, which
+wastes the scheduling.

 ### Review Steps

-**Step 0 — Scheduler availability check**
+**Step 0 — Recommend scheduled review setup**

-Before running the review itself, check whether the weekly review can be
-automated via the platform's task scheduling capability.
+Before running the in-session fallback, check whether scheduled autonomous
+reviews are set up. If not, surface a recommendation to the user — but
+respect prior declines.

-1. Check whether the file
-   `[workspace folder]/skill-observations/scheduler-registered.txt` exists.
-   If it does, the scheduled task has already been registered — skip to
-   Step 1.
+1. Check for the suppression marker at
+   `[workspace folder]/skill-observations/scheduled-review-decline.txt`.
+   If it exists and was last updated less than 30 days ago, AND the
+   in-session fallback has not fired multiple times in that window, skip
+   the recommendation. Proceed to Step 1.

-2. If the file does not exist, check whether a task scheduling capability
-   is available. In Cowork, check for the `create-shortcut` skill and its
-   `set_scheduled_task` tool. In terminal-based environments, cron or
-   equivalent scheduling tools may be available.
+2. Check whether a scheduled review task is registered. The signal is
+   either a presence check via the platform's scheduling tool (preferred)
+   or the existence of
+   `[workspace folder]/skill-observations/scheduler-registered.txt`. If a
+   registered scheduled review is found, no recommendation needed — skip
+   to Step 1.

-3. If a scheduling capability IS available:
-   - Read the draft task description at
-     `[workspace folder]/skill-observations/scheduled-task-draft.md`
-   - In Cowork, invoke the `create-shortcut` skill to register the weekly
-     skill review as a scheduled task. In other environments, use the
-     available scheduling mechanism.
-   - Use task name `weekly-skill-review` and a weekly cadence (e.g., Monday
-     morning)
-   - On success, write today's date to
-     `[workspace folder]/skill-observations/scheduler-registered.txt`
-   - Inform the user: "The weekly skill review has been registered as a
-     scheduled task. The manual `last-review-date.txt` trigger remains as
-     a fallback."
+3. If no scheduled review is registered AND no recent decline marker
+   exists (or the marker is stale because the fallback keeps firing),
+   make an active recommendation:

-4. If the tool is NOT available, proceed silently to Step 1. Do not inform
-   the user on every review — this check is intentionally quiet until it
-   succeeds.
+   > "I notice you don't have a recurring skill review scheduled. The
+   > task-observer recommends running this review on a cadence — e.g.,
+   > Monday/Wednesday/Friday mornings — so it doesn't depend on you
+   > being mid-session at the right moment. Want help setting one up?"
+
+   - **If the user says yes:** walk through registering a scheduled task
+     using the platform's scheduling capability. In Cowork, invoke the
+     `create-shortcut` skill and its `set_scheduled_task` tool. In
+     terminal-based environments, use cron or an equivalent scheduler.
+     Use task name `weekly-skill-review` (or similar) and a sensible
+     default cadence; let the user pick the day(s) and time. Once
+     registered, read the draft task description at
+     `[workspace folder]/skill-observations/scheduled-task-draft.md` and
+     pass it as the task prompt. On success, write today's date to
+     `[workspace folder]/skill-observations/scheduler-registered.txt`.
+   - **If the user says no or defers:** write today's date to
+     `[workspace folder]/skill-observations/scheduled-review-decline.txt`
+     to suppress the recommendation for 30 days. Proceed to Step 1 and
+     run the in-session fallback.
+
+4. If no scheduling capability is available in the current environment,
+   skip the recommendation silently and proceed to Step 1. Do not surface
+   the recommendation in environments where the user couldn't act on it.
+
+The 30-day suppression isn't permanent. If the in-session fallback keeps
+firing within the suppression window — a signal that the recurring need
+is real and the one-time decline was situational — the recommendation
+re-surfaces on the next firing.

 **Step 1 — Load observations and principles**

@@ -873,6 +1229,10 @@ contain general principles that apply more broadly than the original context
 suggested. Consider both the specific "Suggested improvement" and the general
 "Principle" fields. Build a mapping of skill → [relevant observations].

+**If the review is interactive (user present):** Present ALL observations to the user in a single message, grouped by skill. For each observation, show the number, title, and a one-sentence summary. Flag any observations that are ambiguous, risky, or require a judgment call as 'Needs your input'. All other observations are treated as straightforward and can be applied without individual discussion.
+
+**If the review is scheduled autonomous (user not present):** Skip the user-facing present step. Apply the approval policy from "Interactive vs Scheduled Runs" above: apply every non-escalated observation and record the escalated ones (new-skill candidates, removal/restructuring, self-flagged uncertainty, conflicting observations) in the review report without applying them. Proceed directly to Step 4.
+
 **Step 4 — Cross-check cross-cutting principles against every skill**

 For each active cross-cutting principle, check whether each skill already
@@ -880,8 +1240,7 @@ complies. Flag any skills that do not yet implement the principle.

 **Step 5 — Apply updates**

-For each skill that has relevant observations or non-compliant principles,
-create an updated version of its SKILL.md. When editing:
+In interactive runs, wait for user confirmation (blanket "apply all" or selective approval) before creating updates. In scheduled autonomous runs, proceed directly to applying all non-escalated observations. For each skill that has relevant observations or non-compliant principles, create an updated version of its SKILL.md. When editing:

 - Integrate the insight into the appropriate section of the skill (don't just
  append a list of observations at the bottom)
@@ -928,8 +1287,7 @@ Write today's date to

 **Step 8 — Present summary and user action items**

-Present the user with a clear summary and explicit instructions for what they
-need to do. Follow the format in Delivering Updated Skills below.
+Present each updated skill file using `present_files`, then show the user a summary following the format in Delivering Updated Skills above. The user can install updated skills directly from the conversation using the upload button on each presented file.

 ### Constraints

@@ -946,26 +1304,23 @@ need to do. Follow the format in Delivering Updated Skills below.
 ## Delivering Updated Skills to the User

 When the weekly review (or any other process) produces updated skill files,
-the updated files must be delivered to the user for manual replacement. Skill
-files live in a read-only location during sessions and may be managed in
-version control, synced across devices, or packaged for distribution.
-Automatic in-place editing is neither possible nor desirable — delivering to
-the workspace folder with explicit instructions keeps the user in control.
+they are delivered to the user through the conversation using `present_files`.
+Cowork's UI includes an upload button on presented skill files that allows
+the user to install them directly into their capabilities — no manual file
+copying needed.

 ### Delivery Process

-1. Save each updated SKILL.md to the workspace folder using this structure:
+1. Save each updated SKILL.md to the workspace folder for record-keeping:

   ```
   [workspace folder]/skill-updates/[date]/[skill-name]/SKILL.md
   ```

-   For example:
-   ```
-   [workspace folder]/skill-updates/2026-02-16/my-skill-name/SKILL.md
-   ```
+2. Present each updated skill file using `present_files` so the user can
+   review it inline and install it directly via the upload button.

-2. Present the user with an explicit action list using this format:
+3. Present the user with a summary using this format:

   ```
   ## Weekly Skill Review Complete — [date]
@@ -973,18 +1328,11 @@ the workspace folder with explicit instructions keeps the user in control.
   The following skills have been updated based on [N] open observations
   and [N] cross-cutting principles.

-   ### What you need to do
-
-   For each updated skill below, replace the existing SKILL.md in your
-   skill directory with the updated version from your workspace folder.
-
   ### Updated Skills

   **[skill-name]**
   - Changes: [1-sentence summary of what changed]
   - Observations applied: #[N], #[N]
-   - Updated file: [link to file in workspace folder]
-   - Action: Replace [skill-directory]/SKILL.md with this file

   [repeat for each updated skill]

@@ -992,12 +1340,17 @@ the workspace folder with explicit instructions keeps the user in control.
   [list of observation numbers and titles marked ACTIONED]

   ### Skipped (needs manual review)
-   [observations that were unclear or couldn't be applied, with explanation]
-
-   ### No Changes Needed
-   [skills that were checked but already compliant]
+   [any observations that couldn't be applied, with reasons]
   ```

+### Keep-Two Rule
+
+The `skill-updates/` directory uses a rolling retention policy: for any
+given skill, keep only the two most recent date directories. When a skill
+appears in more than two date directories, delete the oldest copies. This
+prevents the workspace from accumulating stale update history while still
+keeping a short rollback window.
+
 3. Do not proceed with other work until the user has acknowledged the
   summary. The user does not need to replace the files immediately, but
   they should be aware of what's pending.
@@ -1076,58 +1429,9 @@ provides the complete historical record.

 ---

-## The Pre-Flight Principle
-
-One of the most important patterns this skill should propagate to every skill
-it helps create or improve: **built-in enforcement.**
-
-Real-world experience has shown that rules documented in a skill are not
-always followed during the creative flow of producing output. The result:
-output that violates the skill's own standards, which reflects badly on the
-skill.
-
-The fix: every skill that contains explicit rules or requirements should
-include a verification step where Claude re-reads the rules and checks its
-output against them before delivery. This isn't overhead — it's quality
-assurance. A 30-second re-read prevents a 30-minute rework cycle.
-
-When creating or improving any skill through this observation process, ask:
-"Does this skill have rules? If yes, does it have a mechanism to enforce
-them?" If the answer to the second question is no, add one.
-
-### General Debugging Principle
-
-When debugging, always ask: is this a single instance or a pattern? If an
-error reveals a pattern (e.g., a class of similar issues), fix the class,
-not just the instance. Every specific error is a signal about a class of
-errors. Audit the full scope on first encounter to avoid discovering related
-failures in subsequent cycles.
-
-### Self-Enforcement
-
-This skill practises what it preaches. Before surfacing observations at end
-of session, verify:
-
-1. Were observations logged throughout the full session — including during
-   post-task feedback, discussion phases, and reflective conversations, not
-   just during active tool use?
-2. Were observations logged silently without interrupting the user's flow?
-3. Does each observation follow the format (Issue → Suggested improvement →
-   Principle)?
-4. Is each observation tagged with the correct type (open-source or internal)?
-5. For any observations about existing skills, does the suggested improvement
-   reference the specific section or rule?
-6. For any observation tagged `type: open-source`, does the Principle field
-   contain any client-identifying information? If so, generalise it before
-   surfacing.
-
-If any observation fails these checks, fix it before surfacing.
-
---
-
 ## Environment Compatibility

-The observation methodology works in any environment where Claude can interact
+The observation methodology works in any environment where the agent can interact
 with users during task-oriented work. The persistence mechanism is what varies.

 ### With Persistent Storage
@@ -1218,4 +1522,3 @@ environments.
 | Log archival? | Event-driven — resolved entries are archived on the next log write |
 | Simplification signals? | Watch for one-off rules, never-used sections, elaborate workflows users skip, and contradictions |
 | Handoff doc analysis? | Systematically extract implied observations from action items, open questions, and narrative sections |
-| Debugging approach? | Always identify the class of error, not just the instance; audit full scope on first encounter |