From b6b69545859f076dfca6fa0ae8c54be09650c85a Mon Sep 17 00:00:00 2001 From: Eoghan Henn Date: Tue, 28 Apr 2026 20:52:29 +0200 Subject: [PATCH] New self-improved version Latest version of the meta-skill, based on the most recent self-improvements it has applied. Readme and User Guide also updated to reflect current usage and environment. --- README.md | 8 +- SKILL.md | 697 ++++++++++++++++++++++++++++++++++++-------------- USER-GUIDE.md | 9 + 3 files changed, 514 insertions(+), 200 deletions(-) diff --git a/README.md b/README.md index 993fb25..29a5591 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,8 @@ Task Observer monitors your work sessions and looks for three things: At the end of each session, it produces a structured observation log: what it noticed, which skills are affected, and specific suggested improvements. You review, approve, and your skills evolve. +Some observations reveal patterns that aren't specific to one skill. These get captured as **cross-cutting principles** in a separate file — and new skills are automatically checked against them whenever they're created or updated. The more you use the system, the higher the quality floor across your whole skill library. + ## Who it's for You don't need to be a developer. If you use skills in any capacity (for writing, research, client work, analysis, content creation, anything) and you want those skills to get better over time instead of staying frozen, this is for you. @@ -35,9 +37,9 @@ It's particularly valuable if you've built multiple skills and want a systematic The observer doesn't modify your skills directly. It produces recommendations that you review. You stay in control of what changes and when. -**In Claude Cowork (including Dispatch) or Claude Code in the desktop app:** Full experience. The observer writes observation logs to your filesystem, so improvements persist between sessions and can be actioned easily. +**In Claude Cowork (including Dispatch) or Claude Code in the desktop app:** Full experience. The observer writes observation logs to your filesystem, so improvements persist between sessions and can be actioned easily. Observations land in `[your shared folder]/skill-observations/`; proposed skill updates land in `[your shared folder]/skill-updates/`. You don't normally need to look at these directly — Claude handles them — but they're there if you want to inspect what's been captured. -**In Claude.ai web or or Claude Chat in the desktop app / mobile app:** Handoff doc mode. Since there's no filesystem access, the observer produces a structured handoff document at the end of your session that you can use to update your skills manually. +**In Claude.ai web or Claude Chat in the desktop app / mobile app:** Handoff doc mode. Since there's no filesystem access, the observer produces a structured handoff document at the end of your session that you can use to update your skills manually. ## Compatibility @@ -62,7 +64,7 @@ If you try it on another platform, please let me know how it goes. Issues and pu 2. Read the user guide at [https://github.com/rebelytics/one-skill-to-rule-them-all/blob/main/USER-GUIDE.md](https://github.com/rebelytics/one-skill-to-rule-them-all/blob/main/USER-GUIDE.md) 3. Make sure that the skill loads in all sessions where it's needed (I solved this via an instruction in my CLAUDE.md file) 4. Try to remember to ask "Any observations logged" when you finish a session (I do this every time I archive a session). Often, the skill then finds additional improvement potential that it didn't log before. -5. Schedule a recurring review session that applies all open obervations. Mine runs Monday, Wednesday and Friday morning, but you should adapt this to your needs. +5. Schedule a recurring review session that applies all open observations. Mine runs Monday, Wednesday and Friday morning, but you should adapt this to your needs. ## The self-improving part diff --git a/SKILL.md b/SKILL.md index 57f6fbe..fe69e03 100644 --- a/SKILL.md +++ b/SKILL.md @@ -3,14 +3,17 @@ name: task-observer description: > Monitors task execution for skill improvement opportunities. Use this skill during ANY multi-step task, agentic workflow, or substantive work session where - Claude is using tools and producing deliverables. It captures patterns, user + the agent is using tools and producing deliverables. It captures patterns, user corrections, workflow insights, and methodology worth preserving as reusable skills. Also triggers during post-task feedback discussions and when the user explicitly mentions skill observations, improvements, the observation log, - skill taxonomy, or asks Claude to watch for skill opportunities. Also known + skill taxonomy, or asks the agent to watch for skill opportunities. Also known as "One Skill to Rule Them All" — trigger on this phrase too. IMPORTANT: this skill should be invoked at the start of every task-oriented session — if you are about to use tools to produce deliverables, invoke this skill first. + For reliable activation, pair this description with a CLAUDE.md instruction + or harness-level session-start hook (see Recommended Activation Setup) — + description-level matching alone is not enforceable. --- # Task Observer — Continuous Skill Discovery & Improvement @@ -42,11 +45,18 @@ feedback public and discoverable — other users benefit from seeing existing issues and solutions. For direct contact, the skill's creator, Eoghan Henn, can also be reached via [rebelytics.com](https://rebelytics.com). -If feedback appears to stem from the skill's methodology (rather than Claude's +If feedback appears to stem from the skill's methodology (rather than the agent's execution of it), log it for the user and suggest they share it via GitHub -Issues. If the issue stems from Claude not following the skill's rules, +Issues. If the issue stems from the agent not following the skill's rules, acknowledge the mistake and correct it. +**Activation note:** For reliable session-start activation, pair this skill +with a CLAUDE.md instruction or harness-level hook (see Recommended +Activation Setup). The description matches against task-oriented language, +but description-level matching alone can be missed when the agent is focused on +the task itself. The skill works as a skill; it works *reliably* as a skill +plus a structural trigger. + --- ## Why This Skill Exists @@ -67,65 +77,39 @@ workflow. --- -## Getting Started +## User documentation -Here's what happens when you first install this skill. +User-facing onboarding for this skill — installation, shared folder setup, +activation patterns, expected behaviour, the cadence pattern, the open-source +vs internal distinction — lives in the public repo, not in this skill body. +If a user asks how to get started or how the skill works from their +perspective, point them to: -**First session:** The skill creates the observation log file. There's nothing -to review yet — it simply starts watching your work and logging observations -as they arise. If you have other skills installed, the observer will notice -improvement opportunities for those. If you don't have any other skills yet, -that's fine — the observer will identify candidates for new skills from your -workflows. +- README: https://github.com/rebelytics/one-skill-to-rule-them-all/blob/main/README.md +- USER-GUIDE: https://github.com/rebelytics/one-skill-to-rule-them-all/blob/main/USER-GUIDE.md -**First few sessions:** Observations accumulate in the log. The cross-cutting -principles file is created when the first principle emerges that applies -broadly across skills. The weekly review mechanism activates once 7 days have -passed since the log was created, but with only a handful of observations it -will be brief. +If web access is available, fetch the relevant section directly rather than +paraphrasing — the public docs are the source of truth for user-facing +guidance and are versioned independently. The remainder of this skill is +operational instruction for the agent. -**Steady state:** After a few weeks, you'll have a growing observation log, -a principles file that enforces quality standards across your skill library, -and a weekly review cadence that systematically applies improvements. The -skill compounds in value as your skill library grows. +## Conventions -**What you need to start:** Nothing beyond the skill itself. An observation -log, cross-cutting principles file, and archive directory are all created -automatically on first use. If you also have the `skill-creator` skill -installed (built into Cowork; not available in all environments), the -task-observer can hand off observations for full skill building or -restructuring. Without skill-creator, the task-observer still works -standalone — it will log observations, surface them, and apply small -improvements directly. Larger changes would be done manually. - -### Quick Start - -Want to get running in under 15 minutes? Here's the minimal path: - -1. **Install the skill.** Add task-observer to your skills directory. -2. **Add the activation line.** Copy this into your CLAUDE.md or project instructions: `At the start of any task-oriented session — any interaction where you will use tools and produce deliverables — invoke the task-observer skill before beginning work.` -3. **Do a real task.** Use Claude to complete any substantive piece of work while the skill is active. -4. **See your first observation.** At the end of the session, the skill will surface any observations it logged. That's it. - -No pre-setup, no configuration files to edit manually. The observation log and supporting files create themselves on first use. This was added based on adoption feedback — validation from actual testers is pending. - -### What is `[workspace folder]`? - -Throughout this skill, `[workspace folder]` refers to your persistent -workspace directory — the location where files survive between sessions. In -Cowork, this is the folder you selected at the start of the session. In -Claude Code, this is your project root. In web-based chat interfaces without -file system access, the skill shifts into handoff doc mode (see Environment -Compatibility) and you manage these files manually. +`[workspace folder]` refers to the user's persistent workspace directory — +the location where files survive between sessions. In Cowork, this is the +folder selected at session start. In Claude Code, this is the project root. +In web-based chat interfaces without filesystem access, the skill shifts +into handoff doc mode (see Environment Compatibility) and the user manages +these files manually. --- ## Recommended Activation Setup This skill needs to be invoked at the start of task-oriented sessions to work -effectively. Because skill invocation depends on Claude matching the user's +effectively. Because skill invocation depends on the agent matching the user's request against skill descriptions, a skill that monitors *all* tasks can be -overlooked when Claude is focused on the task itself. +overlooked when the agent is focused on the task itself. To maximise activation reliability, add the following instruction to your configuration file (e.g., CLAUDE.md, project instructions, or equivalent): @@ -191,6 +175,63 @@ The detection approach depends on the environment: This check runs once at session start and does not repeat. Keep the suggestion brief — one or two sentences, not a full tutorial. +### Compaction Behaviour + +When a session context compacts mid-task, the CLAUDE.md structural trigger +re-invokes task-observer on the resumed session. No explicit re-invocation +is needed on the agent's part — the same activation instruction that fired +at the start of the original session fires again at the start of the +resumed session, because the resumed session reads CLAUDE.md anew. +Observations from before and after compaction append to the same log file +with continuous numbering. + +This is the primary reason the CLAUDE.md structural trigger exists — +description-level triggers alone would not reliably guarantee re-invocation +on a resumed session, because the resumed session's opening message may +not match task-observer's trigger phrases even when the ongoing task is +task-oriented. The structural trigger fires regardless of the resumed +session's opening message. + +--- + +## The Pre-Flight Principle + +One of the most important patterns this skill should propagate to every skill +it helps create or improve: **built-in enforcement.** + +Real-world experience has shown that rules documented in a skill are not +always followed during the creative flow of producing output. The result: +output that violates the skill's own standards, which reflects badly on the +skill. + +The fix: every skill that contains explicit rules or requirements should +include a verification step where the agent re-reads the rules and checks its +output against them before delivery. This isn't overhead — it's quality +assurance. A 30-second re-read prevents a 30-minute rework cycle. + +When creating or improving any skill through this observation process, ask: +"Does this skill have rules? If yes, does it have a mechanism to enforce +them?" If the answer to the second question is no, add one. + +### Self-Enforcement + +This skill practises what it preaches. Before surfacing observations at end +of session, verify: + +1. Were observations logged throughout the full session — including during + post-task feedback, discussion phases, and reflective conversations, not + just during active tool use? +2. Were observations logged silently without interrupting the user's flow? +3. Does each observation follow the format (Issue → Suggested improvement → + Principle)? +4. Is each observation tagged with the correct type (open-source or internal)? +5. For any observations about existing skills, does the suggested improvement + reference the specific section or rule? +6. For any observation tagged `type: open-source`, does the Principle field + contain any client-identifying information? If so, generalise it before + surfacing. +If any observation fails these checks, fix it before surfacing. + --- ## Skill Taxonomy @@ -258,24 +299,68 @@ Internal skills are working documents, not published artifacts. Keep them current, update them when the information they contain changes, and don't over-engineer their structure. +### Lean Content + +A skill should contain only content that meaningfully changes the agent's +behaviour at execution time. Anything that doesn't — changelogs, version +notes, "thanks to X" credits, self-narrating prose, or other +maintainer-facing context — belongs in a supporting doc alongside the +skill, not inside the SKILL.md itself. + +This rule cuts content the agent reads but doesn't act on. It does NOT cut +examples, anti-patterns, or worked scenarios — those are load-bearing for +rule adherence (bare rules without their context get violated more +reliably than rules with context). The test is whether the content, +removed, would change how the agent behaves. If yes, keep it. If no, +move it out. + +Common examples of content that should live outside the skill: + +- Change history / release notes / version logs — keep in a supporting + history doc, in commit history, or both. +- Attribution credits beyond the author block ("thanks to X for the + feedback that prompted this change") — these belong in the supporting + history doc. +- Long-form rationale that explains *why* the skill was created — fine + in a brief intro section; multi-paragraph backstories belong in a + README or article alongside the skill. +- Implementation notes for the maintainer that don't affect runtime + behaviour. + +Both open-source and internal skills are subject to this rule. The agent +loads the skill's content into context on every invocation; every +non-load-bearing line is paid token cost with no behavioural payoff. + --- ## Licensing -Open-source skills should include a licence to make sharing terms explicit. -The recommended licence is **Creative Commons Attribution 4.0 International -(CC BY 4.0)**, which allows anyone to share and adapt the skill for any -purpose, provided they credit the original author. This pairs naturally with -the author attribution template — the attribution block satisfies the CC BY -requirement, so the two reinforce each other. +Open-source skills should include an open-source licence to make sharing +terms explicit. Any commonly recognised open-source licence works — the +choice depends on the author's preference and what they're optimising for. +Common options: -Include the licence statement in the skill preamble (after the author -attribution block) and include a `LICENSE.txt` file in the skill directory -containing the full licence text. +- **CC BY 4.0** — designed for creative works (prose, documentation). + Permissive: anyone can share and adapt provided they credit the + author. A natural fit for prose-heavy skills where the methodology is + the value. +- **MIT** — short, familiar to developers, broadly permissive. Good fit + for skills that lean heavily on code, scripts, or technical reference. +- **Apache 2.0** — like MIT but with an explicit patent grant. Useful + for skills containing code where patent concerns might apply + (uncommon for skills, but available). +- **CC BY-SA 4.0** — share-alike: derivative works must use the same + licence. Use when adaptations should remain open under the same + terms. +- **GPL family (GPL/LGPL/AGPL)** — strong copyleft for code. Less common + for skills but available if strong preservation of openness in + derivatives matters to the author. -If CC BY 4.0 doesn't fit a particular skill (e.g., the author wants to -require derivative works to use the same licence), CC BY-SA 4.0 is an -alternative. The choice should be made by the skill's author. +Whatever licence is chosen, include the licence statement in the skill +preamble (after the author attribution block) and include a `LICENSE` or +`LICENSE.txt` file in the skill directory containing the full licence +text. The choice belongs to the skill's author; the requirement is that +there be a licence. --- @@ -289,10 +374,10 @@ Replace the placeholders with the actual author's details. [1-2 sentence description of what the skill does and its provenance.] -**Licence:** This skill is released under the -[Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) -licence. You are free to share and adapt this skill for any purpose, provided -you give appropriate credit to the original author. +**Licence:** This skill is released under [LICENCE NAME]. [One-sentence +summary of the licence — e.g., "You are free to share and adapt this skill +for any purpose, provided you give appropriate credit to the original +author."] **Feedback & Support:** If at any point during the process you encounter questions about the methodology, or if the user expresses frustration or @@ -303,8 +388,8 @@ discoverable. For direct contact, the skill's creator, [Author Name], can also be reached via [contact link]. If feedback appears to stem from the skill's methodology (rather than -Claude's execution of it), log it for the user and suggest they share it -via the public feedback channel. If the issue stems from Claude not +The agent's execution of it), log it for the user and suggest they share it +via the public feedback channel. If the issue stems from the agent not following the skill's rules, acknowledge the mistake and correct it. ``` @@ -363,7 +448,7 @@ no deliverables are being discussed. - A domain-specific process with clear inputs, phases, and outputs - The user describing a process they've refined over time ("I always do it this way", "the process for this is...") -- Claude and the user naturally developing a structured approach to a problem +- the agent and the user naturally developing a structured approach to a problem that could be formalised **Signals for IMPROVING an existing skill:** @@ -372,9 +457,9 @@ Any new information from a task that uses a skill and could make that skill better is worth capturing. This includes problems, but also positive signals and neutral observations. Examples: -- Claude doesn't follow a skill's rules despite them being documented — this +- the agent doesn't follow a skill's rules despite them being documented — this means the skill needs stronger enforcement, not just better rules -- The user corrects Claude's output in a way that reveals a missing rule or +- The user corrects the agent's output in a way that reveals a missing rule or an edge case the skill doesn't cover - A skill's recommended workflow turns out to be less efficient than what emerged naturally during the task @@ -404,10 +489,19 @@ Signals that a skill is ready to be simplified: - A rule added from a single observation that hasn't been validated by recurrence — one-off cases should not accumulate as permanent rules - An elaborate workflow that users consistently shortcut or skip -- Sections that Claude loads but never acts on (dead weight in context +- Sections that the agent loads but never acts on (dead weight in context window) - Rules that contradict each other or create unnecessary complexity - Complexity added "just in case" that has never triggered +- A documented rule that the agent consistently fails to follow — the rule + isn't reaching the moment of decision. The fix is rarely to write it more + loudly; usually it's either to remove the rule, or to convert it from + narrative guidance into structural enforcement (a checklist, a + verification step, or a tool call that can't be skipped). + +Treat the list above as a review checklist when looking at any of your own +skills — a "yes" on any signal is a candidate for simplification or +removal, not just a flag for future consideration. During weekly reviews, ask "what can we remove?" as deliberately as you ask "what should we add?" When a previously-applied observation turns out to be @@ -435,6 +529,17 @@ Tie observation flushing to existing workflow checkpoints — e.g., when marking a TodoWrite item as completed, check whether any unlogged observations have accumulated and write them before proceeding. +**Mandatory observation checkpoint after every 3rd TodoWrite completion:** After +marking the 3rd, 6th, 9th (etc.) TodoWrite item as completed in a session, +pause and explicitly ask: "Have any unlogged observations accumulated?" This is +a hard checkpoint, not a suggestion — the skill has demonstrated that softer +"check when completing items" guidance gets lost during cognitively demanding +analytical work. The count doesn't need to be precise; the rule is: roughly +every third completion, stop and flush. If nothing has accumulated, the pause +costs seconds. If observations have accumulated, this prevents the common +failure mode where the skill is loaded but no observations are written until +the user explicitly asks. + **Before assigning any observation number, run a mandatory pre-logging step:** Search the entire log file for all lines matching the pattern `### Observation \d+:`, extract the highest observation number already in use, and increment from there. @@ -454,6 +559,72 @@ grep -o '### Observation [0-9]*' log.md | grep -o '[0-9]*' | sort -n | tail -1 This prevents the recurring numbering collision issue where partial reads of large files create a false sense of awareness of the current count. +**Write-time verification assertion (mandatory):** The pre-logging step above +catches honest mistakes, but is vulnerable to parallel-session scenarios where +multiple task-oriented sessions on the same day each compute "next number" +against a snapshot and then collide on write. To catch this class of collision, +after determining the proposed next number and immediately before appending, +re-read the log and assert the number does not already exist: + +```bash +PROPOSED=$(( $(grep -oP '### Observation \K\d+' log.md | sort -n | tail -1) + 1 )) +grep -qE "^### Observation ${PROPOSED}:" log.md && { + echo "COLLISION on #${PROPOSED} — another writer has claimed this number"; exit 1; } +# If assertion passes, proceed with the append using #${PROPOSED}. +``` + +If the assertion fires, increment past all existing numbers (not just by 1) +and re-check. Treat an assertion failure as a meta-observation worth logging +— it indicates either a parallel-session collision or a stale read elsewhere +in the workflow. + +**Post-write verification (mandatory — closes the TOCTOU race):** The +pre-write assertion catches stale-read collisions but cannot close the +time-of-check-to-time-of-use race between the assertion and the append. +In shell, `grep -q && cat >> ...` is two separate operations: the grep +passes at T0, the append lands at T1. Any other session that appends +between T0 and T1 can claim the same number — this race has been observed +in production, producing duplicate observation pairs in the active log. + +After the append, re-read the log and count occurrences of the just-written +observation number. If the count is greater than 1, a parallel session has +collided — renumber the current session's entry to `max+1` in place via +`sed`. Concrete shell: + +```bash +WRITTEN=$(grep -cE "^### Observation ${PROPOSED}:" log.md) +if [ "$WRITTEN" -gt 1 ]; then + # Find my line (the last occurrence, since I just appended) and renumber + MY_LINE=$(grep -nE "^### Observation ${PROPOSED}:" log.md \ + | tail -1 | cut -d: -f1) + NEW_NUM=$(( $(grep -oP '^### Observation \K\d+' log.md \ + | sort -n | tail -1) + 1 )) + sed -i "${MY_LINE}s/^### Observation ${PROPOSED}:/### Observation ${NEW_NUM}:/" log.md +fi +``` + +This turns the pre-write assertion into a pre-and-post pair. Pre-write +catches stale-read collisions cheaply; post-write catches race collisions +by renumbering instead of failing. Either way, the log ends up with no +duplicates. Alternative approaches — lockfile, atomic append, transactional +write — are heavier and require more infrastructure; the +post-write-verify-and-renumber pattern works with plain shell and +self-heals. + +**Why both checks are required:** Stale-read collisions and race-condition +collisions are different classes of error. The pre-write assertion closes +the first; the post-write verification closes the second. Stacking more +pre-write layers does not close race cases — only a post-write check can. +When the shared state is a log file written by parallel agents, the +reliable pattern is check-then-act-then-verify. + +**Session-start staleness check:** At the start of any task-oriented session, +note the modification time of `log.md`. If it was modified in the last few +hours (i.e., a parallel or recent session has been writing to it), be extra +cautious about the numbering pre-check — do not trust any mental model of +"current number" and always re-read the log immediately before appending each +observation, not just once at session start. + **Format and insertion rules:** Always use the `### Observation NNN:` format. Always append new observations to the END of the log file. Never insert observations mid-file. Never use alternative ID formats (e.g., `OBS-YYYY-MMDD-NN`). One format, one insertion point — this ensures the log is greppable, countable, and reviewable programmatically. Each observation follows this format: @@ -468,7 +639,7 @@ Each observation follows this format: **Phase/Area:** [which part of the skill or workflow this relates to] **Issue:** [What happened or what was observed. Be specific — include what -Claude did, what the user corrected, or what pattern emerged. Include enough +The agent did, what the user corrected, or what pattern emerged. Include enough detail that someone reading this weeks later can understand the context without having seen the original conversation.] @@ -624,6 +795,58 @@ confidentiality. When in doubt about whether a detail is too specific, remove it. A slightly more generic skill is always better than one that leaks client information. +### Layer 5: Cross-Product Re-Identifiability Sweep + +Layers 1–4 focus on single-example scrubbing. They do not catch the case +where two or three sanitised examples in the same skill — each fine on its +own — combine to narrow the identifiable client set. A reader who knows +the author's client portfolio (which is often public on a consultant's +website) can triangulate even when each individual example is properly +placeholdered. The failure mode is invisible to the author because they +mentally compartmentalise each example; it's visible to any reader with +adjacent context. + +**When to run it:** After every individual example has been sanitised — +as a final pass before the skill ships or before any major public +release. This is the last check, not a substitute for earlier layers. + +**What to look for:** + +- **Enumerated counts that match a known client count.** "Four builds + across three verticals" in a skill whose author has four public clients + across three verticals is functionally a directory. Blur the count + ("multiple builds") or the verticals ("across regulated, editorial, + and commerce contexts"). +- **Specific numbers in a thin vertical.** Visibility percentages, + revenue ranges, or geography given in a vertical where only one or two + candidates plausibly exist. A single real client can be narrowed from + "vertical × percentage × geography × timing" even when no name appears. + Replace specific numbers with illustrative ranges. +- **Thinly-disguised placeholder names.** "Northwind Coffee" in a + specialty-retailer vertical where the only plausible specialty-retail + client is a coffee roaster reads as the real brand with a thin + rename. Use the Northwind / Contoso / Fabrikam placeholder family + explicitly, and make sure the placeholder's vertical is different from + any real client's vertical. + +**How to sweep:** + +1. List every worked example in the skill and the fields each one names + (vertical, geography, numeric range, timing, count). +2. Ask: do any two examples share enough fields that a reader with access + to the author's public client list could map the set to real clients? +3. Mitigate by blurring counts, widening verticals, dropping specific + numbers to illustrative ranges, or consolidating similar examples into + a single composite. + +**Why this is a separate layer:** Re-identification risk is combinatorial. +Each additional sanitised example adds a field that narrows the candidate +space. Layers 1–4 check each example in isolation and pass. The cross- +product only emerges when the examples are read together. The author is +the least reliable reader for this check because they know the ground +truth — which is exactly why the sweep has to be a mechanical pass, not +a feeling. + --- ## Surfacing Protocol @@ -657,16 +880,33 @@ candidates listed separately. ## Acting on Observations -This skill identifies WHAT to build or improve. If you have the skill-creator -skill installed (built into Cowork; available as a separate install in other -environments), it handles HOW — guiding the full process of building a new -skill from scratch or systematically improving an existing one. Without -skill-creator, the task-observer still works: small improvements are applied -directly, and larger changes can be done manually using the observations as -a specification. The boundary between direct application and skill-creator -handoff: +This skill identifies WHAT to build or improve. This section covers HOW — +specifically, the cross-context decision framework for choosing between +direct application, skill-creator handoff, and new-skill creation. -### Small Improvements (Apply Directly) +**Trigger gate (when):** Observations are acted on only in three contexts: + +1. **The comprehensive review** — scheduled mode preferred, in-session + fallback if no scheduled review has run in 7+ days. See + "## Comprehensive Review (scheduled or fallback)" for the procedure. +2. **Explicit user requests during a task session** — "update X skill", + "act on observation #N now", "apply this rule to the skill". The user + is naming the action; the agent executes within the framework below. +3. **In-session correction when a skill is producing wrong output and + the user should be aware** — surface immediately rather than wait + for the next review. + +Observations are NOT applied during normal task sessions outside these +contexts. Mid-task work produces observations only; those observations +get applied at the next review or by request. The default is log, +don't act. + +**Mechanism framework (which):** When acting in any of those contexts, +the rest of this section guides the choice between applying changes +directly to the skill file, handing off to the skill-creator for +substantial restructuring, or creating a new skill from scratch. + +### Small Changes If the improvement is clearly additive, low-risk, and doesn't require testing to verify it works, it can be applied directly to the skill: @@ -680,6 +920,8 @@ Examples: Adding a new anti-pattern to a skill's anti-patterns list. Clarifying that inline code comments should be context-aware within their own document. +After creating or updating any skill file, always present it using `present_files` so the user can review and install it directly from the conversation. + ### Substantial Changes (Use Skill-Creator if Available) If the change could affect the skill's behaviour in ways that need @@ -718,6 +960,40 @@ When creating a new skill, determine its type early: - If uncertain, default to open-source — strip out specifics and generalise, then let the user decide whether any internal details need to be added + +## Task-Oriented Sessions — Observation vs Action + +Skill development and iteration work happens in multiple environments: in Cowork with persistent storage, in Claude Code with project directories, and in web-based chat without file system access. Cross-environment coordination is essential to prevent regressions — a skill updated in one environment can silently omit content from another if the wrong base file is used. + +### Skill file locations — read-only mount vs workspace copy + +When working with skills, understand the distinction between the **live file** (the authoritative source) and **workspace copies** (working drafts or staged updates): + +1. **The live file is read-only in Cowork.** In Cowork, the live skill file is mounted read-only at `.claude/skills/{skill}/SKILL.md`. You can read it, but you cannot edit it directly — the file system will reject write attempts with `EROFS` (Read-Only File System). This is intentional: it prevents accidental overwrites of the canonical version. + +2. **Read from the live file, not cached memory.** Always start skill edits by reading the current live file — not from a workspace copy, a prior draft, or a memory-based reconstruction. This is the only way to guarantee your updates are based on the current canonical content. + +3. **Stage edits in the workspace folder.** Write updated versions to `[workspace folder]/skill-updates/[date]/[skill-name]/SKILL.md`. This separation keeps the read-only mount clean and gives you a clear staging area for review before the user replaces the live file. + +4. **After staging, present the file for user review.** Always use `present_files` to show the updated skill so the user can review changes and upload directly. Do not attempt to write directly to the mounted skills directory — that will fail with a permission error. + +5. **Before overwriting or replacing any existing staged or workspace copy of a skill, diff it against the live file.** If they differ, the workspace copy is stale and your edits must be rebased on the live version — otherwise you risk silently dropping content added by another session. This rule is also codified in CLAUDE.md under "Skill Editing — Always Start From the Live File" as a cross-environment guard. The concrete failure mode: a Claude Code session produced an updated skill that was based on a stale snapshot and silently omitted two substantial sections added to the live skill earlier the same day. The regression was caught only because a pre-merge diff against the mount revealed the missing content. + +### Task-session skill updates — stage in the workspace + +When a task session produces a skill update (through weekly review, direct improvement, or observation-driven changes), follow this workflow: + +1. Read the live file at `.claude/skills/{skill}/SKILL.md` +2. Make all edits to that content +3. Save the complete updated file to `[workspace folder]/skill-updates/[today]/[skill-name]/SKILL.md` +4. Use `present_files` to show it to the user for review +5. The user uploads the file to install it + +This keeps the mount clean, stages updates for review, and gives you a clear separation between read-only source and working copy. + +**Cross-environment note:** Claude Code now shares the same skills as Cowork via the anthropic-skills capability. The "always start from the live file" rule applies in both environments. In Claude Code, the live file is surfaced by the capabilities system; in Cowork, it's the read-only mount at `.claude/skills/{skill}/SKILL.md`. The diff-before-overwrite requirement applies regardless of which environment produced the update. + +--- --- ## Principle Propagation @@ -785,55 +1061,135 @@ checklist during any skill creation or regeneration. --- -## Weekly Comprehensive Review +## Comprehensive Review (scheduled or fallback) -Every 7 days, a comprehensive review is triggered automatically at the start -of the first task-oriented session after the interval has elapsed. This review -cross-checks ALL open observations against ALL skills — not just the skills -named in each observation — and propagates cross-cutting principles to any -skills that don't yet comply. +The comprehensive review cross-checks all open observations against all +skills, propagates cross-cutting principles to skills that don't yet +comply, and applies the improvements that don't need user input. There +are two ways it runs. + +**Preferred mode — scheduled autonomous review.** A user-defined recurring +task (typical cadence: Monday/Wednesday/Friday mornings) registered with +the agent's scheduling system. This is preferred because it picks up open +observations on a regular cadence without depending on the user being +mid-session at exactly the right moment, and because the user is not +present, the review applies the non-escalated observations autonomously. + +**Fallback mode — in-session 7-day trigger.** If no scheduled review is +registered (or none has run successfully in the last 7 days), a +comprehensive review fires automatically at the start of the next +task-oriented session. The fallback is a safety net for users who haven't +set up scheduled reviews — either because the environment doesn't support +scheduling or because they haven't done it yet. ### Trigger Mechanism -The review is triggered by step 3 of the Session Start Protocol (see -Observation Log Management). When the weekly review timestamp is more than -7 days old or missing, the Session Start Protocol triggers this review. -Inform the user that the weekly review is due and begin the process. +**Scheduled mode** runs via the user's chosen scheduling tool — no in-skill +trigger required. + +**Fallback mode** is triggered by step 3 of the Session Start Protocol +(see Observation Log Management). The fallback fires when both of the +following are true: + +- No scheduled review task is registered, OR the most recent successful + scheduled review was more than 7 days ago. +- The in-session timestamp at + `[workspace folder]/skill-observations/last-review-date.txt` is also + more than 7 days old (or missing). + +When the fallback fires, inform the user that the comprehensive review is +running and walk through Step 0 (recommend scheduling) before Step 1. + +### Interactive vs Scheduled Runs — Approval Policy + +The approval behaviour depends on who is present: + +**Interactive sessions (user present):** Always ask the user before applying +or declining observations. Present observations grouped by skill with a one- +sentence summary each, and wait for explicit approval (blanket "apply all" or +selective). This preserves the collaborative feel and lets the user catch +observations they disagree with before any staging occurs. + +**Scheduled autonomous runs (user not present):** Apply observations +autonomously by default. The safety net is the staging-plus-upload pattern: +updates go to `skill-updates/YYYY-MM-DD/{skill-name}/SKILL.md` and only +become live when the user explicitly uploads them. Nothing can silently +break because nothing is live until the user approves upload. + +**Escalate without applying (report only) when any of these apply:** + +1. **New skill creation.** Naming, scope, type (open-source vs internal), + and licence are decisions that benefit from user input. Note the + candidate in the report; don't create the skill. +2. **Removing or substantially restructuring existing content.** Any edit + that deletes a section, replaces it with something smaller, or reshapes + core methodology risks dropping institutional memory. Flag and report. +3. **An observation that flags its own uncertainty.** Phrases like "not + sure if...", "this might be...", "worth discussing..." in the + Suggested Improvement field are the observation asking for user input. + Respect that. +4. **Conflicting observations.** Two observations that point in opposite + directions, or where the integration path isn't obvious, should be + surfaced rather than resolved autonomously. + +Scheduled runs that escalate should still apply every non-escalated +observation before producing the report. A scheduled review that +produces 0 applied updates is functionally a report generator, which +wastes the scheduling. ### Review Steps -**Step 0 — Scheduler availability check** +**Step 0 — Recommend scheduled review setup** -Before running the review itself, check whether the weekly review can be -automated via the platform's task scheduling capability. +Before running the in-session fallback, check whether scheduled autonomous +reviews are set up. If not, surface a recommendation to the user — but +respect prior declines. -1. Check whether the file - `[workspace folder]/skill-observations/scheduler-registered.txt` exists. - If it does, the scheduled task has already been registered — skip to - Step 1. +1. Check for the suppression marker at + `[workspace folder]/skill-observations/scheduled-review-decline.txt`. + If it exists and was last updated less than 30 days ago, AND the + in-session fallback has not fired multiple times in that window, skip + the recommendation. Proceed to Step 1. -2. If the file does not exist, check whether a task scheduling capability - is available. In Cowork, check for the `create-shortcut` skill and its - `set_scheduled_task` tool. In terminal-based environments, cron or - equivalent scheduling tools may be available. +2. Check whether a scheduled review task is registered. The signal is + either a presence check via the platform's scheduling tool (preferred) + or the existence of + `[workspace folder]/skill-observations/scheduler-registered.txt`. If a + registered scheduled review is found, no recommendation needed — skip + to Step 1. -3. If a scheduling capability IS available: - - Read the draft task description at - `[workspace folder]/skill-observations/scheduled-task-draft.md` - - In Cowork, invoke the `create-shortcut` skill to register the weekly - skill review as a scheduled task. In other environments, use the - available scheduling mechanism. - - Use task name `weekly-skill-review` and a weekly cadence (e.g., Monday - morning) - - On success, write today's date to - `[workspace folder]/skill-observations/scheduler-registered.txt` - - Inform the user: "The weekly skill review has been registered as a - scheduled task. The manual `last-review-date.txt` trigger remains as - a fallback." +3. If no scheduled review is registered AND no recent decline marker + exists (or the marker is stale because the fallback keeps firing), + make an active recommendation: -4. If the tool is NOT available, proceed silently to Step 1. Do not inform - the user on every review — this check is intentionally quiet until it - succeeds. + > "I notice you don't have a recurring skill review scheduled. The + > task-observer recommends running this review on a cadence — e.g., + > Monday/Wednesday/Friday mornings — so it doesn't depend on you + > being mid-session at the right moment. Want help setting one up?" + + - **If the user says yes:** walk through registering a scheduled task + using the platform's scheduling capability. In Cowork, invoke the + `create-shortcut` skill and its `set_scheduled_task` tool. In + terminal-based environments, use cron or an equivalent scheduler. + Use task name `weekly-skill-review` (or similar) and a sensible + default cadence; let the user pick the day(s) and time. Once + registered, read the draft task description at + `[workspace folder]/skill-observations/scheduled-task-draft.md` and + pass it as the task prompt. On success, write today's date to + `[workspace folder]/skill-observations/scheduler-registered.txt`. + - **If the user says no or defers:** write today's date to + `[workspace folder]/skill-observations/scheduled-review-decline.txt` + to suppress the recommendation for 30 days. Proceed to Step 1 and + run the in-session fallback. + +4. If no scheduling capability is available in the current environment, + skip the recommendation silently and proceed to Step 1. Do not surface + the recommendation in environments where the user couldn't act on it. + +The 30-day suppression isn't permanent. If the in-session fallback keeps +firing within the suppression window — a signal that the recurring need +is real and the one-time decline was situational — the recommendation +re-surfaces on the next firing. **Step 1 — Load observations and principles** @@ -873,6 +1229,10 @@ contain general principles that apply more broadly than the original context suggested. Consider both the specific "Suggested improvement" and the general "Principle" fields. Build a mapping of skill → [relevant observations]. +**If the review is interactive (user present):** Present ALL observations to the user in a single message, grouped by skill. For each observation, show the number, title, and a one-sentence summary. Flag any observations that are ambiguous, risky, or require a judgment call as 'Needs your input'. All other observations are treated as straightforward and can be applied without individual discussion. + +**If the review is scheduled autonomous (user not present):** Skip the user-facing present step. Apply the approval policy from "Interactive vs Scheduled Runs" above: apply every non-escalated observation and record the escalated ones (new-skill candidates, removal/restructuring, self-flagged uncertainty, conflicting observations) in the review report without applying them. Proceed directly to Step 4. + **Step 4 — Cross-check cross-cutting principles against every skill** For each active cross-cutting principle, check whether each skill already @@ -880,8 +1240,7 @@ complies. Flag any skills that do not yet implement the principle. **Step 5 — Apply updates** -For each skill that has relevant observations or non-compliant principles, -create an updated version of its SKILL.md. When editing: +In interactive runs, wait for user confirmation (blanket "apply all" or selective approval) before creating updates. In scheduled autonomous runs, proceed directly to applying all non-escalated observations. For each skill that has relevant observations or non-compliant principles, create an updated version of its SKILL.md. When editing: - Integrate the insight into the appropriate section of the skill (don't just append a list of observations at the bottom) @@ -928,8 +1287,7 @@ Write today's date to **Step 8 — Present summary and user action items** -Present the user with a clear summary and explicit instructions for what they -need to do. Follow the format in Delivering Updated Skills below. +Present each updated skill file using `present_files`, then show the user a summary following the format in Delivering Updated Skills above. The user can install updated skills directly from the conversation using the upload button on each presented file. ### Constraints @@ -946,26 +1304,23 @@ need to do. Follow the format in Delivering Updated Skills below. ## Delivering Updated Skills to the User When the weekly review (or any other process) produces updated skill files, -the updated files must be delivered to the user for manual replacement. Skill -files live in a read-only location during sessions and may be managed in -version control, synced across devices, or packaged for distribution. -Automatic in-place editing is neither possible nor desirable — delivering to -the workspace folder with explicit instructions keeps the user in control. +they are delivered to the user through the conversation using `present_files`. +Cowork's UI includes an upload button on presented skill files that allows +the user to install them directly into their capabilities — no manual file +copying needed. ### Delivery Process -1. Save each updated SKILL.md to the workspace folder using this structure: +1. Save each updated SKILL.md to the workspace folder for record-keeping: ``` [workspace folder]/skill-updates/[date]/[skill-name]/SKILL.md ``` - For example: - ``` - [workspace folder]/skill-updates/2026-02-16/my-skill-name/SKILL.md - ``` +2. Present each updated skill file using `present_files` so the user can + review it inline and install it directly via the upload button. -2. Present the user with an explicit action list using this format: +3. Present the user with a summary using this format: ``` ## Weekly Skill Review Complete — [date] @@ -973,18 +1328,11 @@ the workspace folder with explicit instructions keeps the user in control. The following skills have been updated based on [N] open observations and [N] cross-cutting principles. - ### What you need to do - - For each updated skill below, replace the existing SKILL.md in your - skill directory with the updated version from your workspace folder. - ### Updated Skills **[skill-name]** - Changes: [1-sentence summary of what changed] - Observations applied: #[N], #[N] - - Updated file: [link to file in workspace folder] - - Action: Replace [skill-directory]/SKILL.md with this file [repeat for each updated skill] @@ -992,12 +1340,17 @@ the workspace folder with explicit instructions keeps the user in control. [list of observation numbers and titles marked ACTIONED] ### Skipped (needs manual review) - [observations that were unclear or couldn't be applied, with explanation] - - ### No Changes Needed - [skills that were checked but already compliant] + [any observations that couldn't be applied, with reasons] ``` +### Keep-Two Rule + +The `skill-updates/` directory uses a rolling retention policy: for any +given skill, keep only the two most recent date directories. When a skill +appears in more than two date directories, delete the oldest copies. This +prevents the workspace from accumulating stale update history while still +keeping a short rollback window. + 3. Do not proceed with other work until the user has acknowledged the summary. The user does not need to replace the files immediately, but they should be aware of what's pending. @@ -1076,58 +1429,9 @@ provides the complete historical record. --- -## The Pre-Flight Principle - -One of the most important patterns this skill should propagate to every skill -it helps create or improve: **built-in enforcement.** - -Real-world experience has shown that rules documented in a skill are not -always followed during the creative flow of producing output. The result: -output that violates the skill's own standards, which reflects badly on the -skill. - -The fix: every skill that contains explicit rules or requirements should -include a verification step where Claude re-reads the rules and checks its -output against them before delivery. This isn't overhead — it's quality -assurance. A 30-second re-read prevents a 30-minute rework cycle. - -When creating or improving any skill through this observation process, ask: -"Does this skill have rules? If yes, does it have a mechanism to enforce -them?" If the answer to the second question is no, add one. - -### General Debugging Principle - -When debugging, always ask: is this a single instance or a pattern? If an -error reveals a pattern (e.g., a class of similar issues), fix the class, -not just the instance. Every specific error is a signal about a class of -errors. Audit the full scope on first encounter to avoid discovering related -failures in subsequent cycles. - -### Self-Enforcement - -This skill practises what it preaches. Before surfacing observations at end -of session, verify: - -1. Were observations logged throughout the full session — including during - post-task feedback, discussion phases, and reflective conversations, not - just during active tool use? -2. Were observations logged silently without interrupting the user's flow? -3. Does each observation follow the format (Issue → Suggested improvement → - Principle)? -4. Is each observation tagged with the correct type (open-source or internal)? -5. For any observations about existing skills, does the suggested improvement - reference the specific section or rule? -6. For any observation tagged `type: open-source`, does the Principle field - contain any client-identifying information? If so, generalise it before - surfacing. - -If any observation fails these checks, fix it before surfacing. - ---- - ## Environment Compatibility -The observation methodology works in any environment where Claude can interact +The observation methodology works in any environment where the agent can interact with users during task-oriented work. The persistence mechanism is what varies. ### With Persistent Storage @@ -1218,4 +1522,3 @@ environments. | Log archival? | Event-driven — resolved entries are archived on the next log write | | Simplification signals? | Watch for one-off rules, never-used sections, elaborate workflows users skip, and contradictions | | Handoff doc analysis? | Systematically extract implied observations from action items, open questions, and narrative sections | -| Debugging approach? | Always identify the class of error, not just the instance; audit full scope on first encounter | diff --git a/USER-GUIDE.md b/USER-GUIDE.md index 6defc57..329c857 100644 --- a/USER-GUIDE.md +++ b/USER-GUIDE.md @@ -26,6 +26,15 @@ Once you start your first task in Cowork after installing the meta-skill, make s I started with an empty folder just for Claude Cowork, and it turned into a thriving knowledge base within days. If you prefer to give Claude access to a folder that already has files in it, that's also fine: no risk, no fun. +### What gets stored where + +The meta-skill writes only to its own subdirectories of your shared folder: + +- `[your shared folder]/skill-observations/` — the observation log, the cross-cutting principles file, and an archive of resolved observations +- `[your shared folder]/skill-updates/` — staged versions of skill updates that are waiting for you to install them + +Existing files in your shared folder are not modified by the observer. If you point the meta-skill at a folder that already contains client work or personal documents, those files stay where they are; the observer only reads from them when you ask Claude to use them in a task. Skill updates are also never installed automatically — they're staged in `skill-updates/` for you to review and install yourself. + ## Checking whether the skill has loaded Once you've started a Cowork task by giving your first instructions or some context about the work, you can check in the right sidebar which skills have been invoked. If you think the current task has skill creation or improvement potential but you don't see the task observer, ask Claude directly why the skill hasn't loaded. It should then guide you towards a better setup.