one-skill-to-rule-them-all/SKILL.md

---
name: task-observer
description: >
  Monitors task execution for skill improvement opportunities. Use this skill
  during ANY multi-step task, agentic workflow, or substantive work session where
  the agent is using tools and producing deliverables. It captures patterns, user
  corrections, workflow insights, and methodology worth preserving as reusable
  skills. Also triggers during post-task feedback discussions and when the user
  explicitly mentions skill observations, improvements, the observation log,
  skill taxonomy, or asks the agent to watch for skill opportunities. Also known
  as "One Skill to Rule Them All" — trigger on this phrase too. IMPORTANT:
  this skill should be invoked at the start of every task-oriented session — if
  you are about to use tools to produce deliverables, invoke this skill first.
  For reliable activation, pair this description with a CLAUDE.md instruction
  or harness-level session-start hook (see Recommended Activation Setup) —
  description-level matching alone is not enforceable.
---

# Task Observer — Continuous Skill Discovery & Improvement

**Created by Eoghan Henn / [rebelytics.com](https://rebelytics.com)**

*Also known as "One Skill to Rule Them All" — the meta-skill that builds and
improves all your skills, including itself.*

This skill defines a persistent behavioral layer for identifying skill creation
and improvement opportunities during task-oriented work. It doesn't replace the
skill-creator — it feeds it. Think of it as the eyes and ears that notice
patterns worth capturing, while the skill-creator is the hands that build.

The methodology is user-agnostic. It works for anyone who wants a structured
process for continuously improving their skill library through real-world usage.

**Licence:** This skill is released under the
[Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
licence. You are free to share and adapt this skill for any purpose, provided
you give appropriate credit to the original author.

**Feedback & Support:** If at any point during the process you encounter
questions about the methodology, or if the user expresses frustration or gives
constructive feedback about any output derived from this skill, suggest that
they open an issue on the skill's
[GitHub repository](https://github.com/rebelytics/one-skill-to-rule-them-all). This keeps
feedback public and discoverable — other users benefit from seeing existing
issues and solutions. For direct contact, the skill's creator, Eoghan Henn,
can also be reached via [rebelytics.com](https://rebelytics.com).

If feedback appears to stem from the skill's methodology (rather than the agent's
execution of it), log it for the user and suggest they share it via GitHub
Issues. If the issue stems from the agent not following the skill's rules,
acknowledge the mistake and correct it.

**Activation note:** For reliable session-start activation, pair this skill
with a CLAUDE.md instruction or harness-level hook (see Recommended
Activation Setup). The description matches against task-oriented language,
but description-level matching alone can be missed when the agent is focused on
the task itself. The skill works as a skill; it works *reliably* as a skill
plus a structural trigger.

---

## Why This Skill Exists

Skills are living documents. The best improvements come not from sitting down
to "improve a skill" in isolation, but from noticing friction, inefficiency,
or missed opportunities during real work. A user correction during a project
might reveal a missing rule. A repeated multi-step workflow might be a skill
waiting to be born. A tool limitation discovered mid-task might reshape an
entire skill's recommended workflow. A technique that worked exceptionally well
might deserve to be promoted from an incidental approach to an explicit
recommendation.

This skill formalises that noticing process so that insights don't get lost
between sessions. Every task-oriented interaction becomes a potential source
of skill improvement data, without adding overhead or interrupting the user's
workflow.

---

## User documentation

User-facing onboarding for this skill — installation, shared folder setup,
activation patterns, expected behaviour, the cadence pattern, the open-source
vs internal distinction — lives in the public repo, not in this skill body.
If a user asks how to get started or how the skill works from their
perspective, point them to:

- README: https://github.com/rebelytics/one-skill-to-rule-them-all/blob/main/README.md
- USER-GUIDE: https://github.com/rebelytics/one-skill-to-rule-them-all/blob/main/USER-GUIDE.md

If web access is available, fetch the relevant section directly rather than
paraphrasing — the public docs are the source of truth for user-facing
guidance and are versioned independently. The remainder of this skill is
operational instruction for the agent.

## Conventions

`[workspace folder]` refers to the user's persistent workspace directory —
the location where files survive between sessions. In Cowork, this is the
folder selected at session start. In Claude Code, this is the project root.
In web-based chat interfaces without filesystem access, the skill shifts
into handoff doc mode (see Environment Compatibility) and the user manages
these files manually.

---

## Recommended Activation Setup

This skill needs to be invoked at the start of task-oriented sessions to work
effectively. Because skill invocation depends on the agent matching the user's
request against skill descriptions, a skill that monitors *all* tasks can be
overlooked when the agent is focused on the task itself.

To maximise activation reliability, add the following instruction to your
configuration file (e.g., CLAUDE.md, project instructions, or equivalent):

```
At the start of any task-oriented session — any interaction where you will
use tools and produce deliverables — invoke the task-observer skill before
beginning work. This ensures skill improvement opportunities are captured
throughout the session.

When loading any skill, check the observation log for OPEN observations
tagged to that skill. Apply their insights to the current work, even if
the skill file hasn't been updated yet. This enables immediate application
of observations before they're permanently integrated during the weekly
review.
```

This structural trigger works alongside the skill's description-level triggers.
The description is designed to match broadly against task-oriented language
("multi-step task", "agentic workflow", "work session", "tools and
deliverables"), but a configuration-level instruction provides an additional
safety net that doesn't depend on description matching alone.

**Note for all users:** Once CLAUDE.md or equivalent configuration is in place
with the activation instruction above, the description-level triggers serve as
a backup rather than the primary mechanism. This dual-layer approach prevents
the skill from being skipped in sessions where description matching alone might
miss the invocation signal.

**Anti-pattern to avoid:** Relying on one skill to load another is fragile
compared to loading both independently from CLAUDE.md. If task-observer depended
on another skill to invoke it, a breakdown in that chain would silence all
observation activity. Instead, load both task-observer and any related skills
directly from your configuration instructions.

### Detecting the Configuration File

At session start, the skill should check whether a configuration file
(CLAUDE.md, project instructions, or equivalent) exists and contains the
activation instruction. This detection serves two purposes:

1. **For users who already have the config:** Confirms the dual-layer
   activation is working. No action needed.

2. **For users who don't have the config:** The skill was activated via
   description matching alone, which is less reliable. Surface a brief
   suggestion to add the config-level instruction for more consistent
   activation in future sessions.

The detection approach depends on the environment:

- **Environments with file system access** (desktop tools, terminal-based
  tools): Check for a CLAUDE.md or equivalent file in the workspace root.
  If found, scan it for a task-observer activation instruction. If the file
  exists but doesn't mention task-observer, suggest adding the instruction.
  If no config file exists at all, suggest creating one.

- **Environments without file system access** (web-based chat): Check
  whether the system prompt or project instructions contain a task-observer
  activation instruction. If not, suggest that the user add one to their
  project settings or paste the instruction at the start of future sessions.

This check runs once at session start and does not repeat. Keep the
suggestion brief — one or two sentences, not a full tutorial.

### Compaction Behaviour

When a session context compacts mid-task, the CLAUDE.md structural trigger
re-invokes task-observer on the resumed session. No explicit re-invocation
is needed on the agent's part — the same activation instruction that fired
at the start of the original session fires again at the start of the
resumed session, because the resumed session reads CLAUDE.md anew.
Observations from before and after compaction append to the same log file
with continuous numbering.

This is the primary reason the CLAUDE.md structural trigger exists —
description-level triggers alone would not reliably guarantee re-invocation
on a resumed session, because the resumed session's opening message may
not match task-observer's trigger phrases even when the ongoing task is
task-oriented. The structural trigger fires regardless of the resumed
session's opening message.

---

## The Pre-Flight Principle

One of the most important patterns this skill should propagate to every skill
it helps create or improve: **built-in enforcement.**

Real-world experience has shown that rules documented in a skill are not
always followed during the creative flow of producing output. The result:
output that violates the skill's own standards, which reflects badly on the
skill.

The fix: every skill that contains explicit rules or requirements should
include a verification step where the agent re-reads the rules and checks its
output against them before delivery. This isn't overhead — it's quality
assurance. A 30-second re-read prevents a 30-minute rework cycle.

When creating or improving any skill through this observation process, ask:
"Does this skill have rules? If yes, does it have a mechanism to enforce
them?" If the answer to the second question is no, add one.

### Self-Enforcement

This skill practises what it preaches. Before surfacing observations at end
of session, verify:

1. Were observations logged throughout the full session — including during
   post-task feedback, discussion phases, and reflective conversations, not
   just during active tool use?
2. Were observations logged silently without interrupting the user's flow?
3. Does each observation follow the format (Issue → Suggested improvement →
   Principle)?
4. Is each observation tagged with the correct type (open-source or internal)?
5. For any observations about existing skills, does the suggested improvement
   reference the specific section or rule?
6. For any observation tagged `type: open-source`, does the Principle field
   contain any client-identifying information? If so, generalise it before
   surfacing.
If any observation fails these checks, fix it before surfacing.

---

## Skill Taxonomy

All skills fall into one of two categories. The distinction matters because
it determines what information the skill can contain, how it's structured,
and whether it can be shared publicly. Crucially, the open-source/internal
boundary is also a **confidentiality boundary** — open-source skills must
never contain any information that could identify a client, project, or
proprietary process, even indirectly.

### Open-Source Skills

Open-source skills are client-agnostic and methodology-driven. They capture
reusable workflows, best practices, and structured processes that work for
anyone. They include author attribution, a licence, and a feedback pathway
so that real-world usage drives improvement.

**How to recognise an open-source candidate:**

- The methodology works across different clients, projects, and contexts
- No proprietary information is required for the skill to function
- Other practitioners in the same domain would find it valuable
- The skill captures a process or approach, not personal preferences

**Required elements:**

- Skill body clearly identifies itself as open-source, with author name and
  contact information
- Author attribution block at the top (see Author Attribution Template below)
- Licence statement — CC BY 4.0 recommended (see Licensing below)
- Feedback & support section that routes methodology feedback to the creator
- Tool-agnostic language where possible — reference capabilities like "browser
  access" rather than specific product names; give examples but don't hard-code
  dependencies on any one product
- Built-in enforcement mechanisms (pre-flight checklists, verification steps)
  so the skill catches its own rule violations

**Default bias:** When a skill could go either way, default to open-source.
Strip out client-specific details and generalise the methodology. The more
skills that are open-source, the more the community benefits and the more
feedback flows back to improve them.

### Internal Skills

Internal skills contain information specific to a user, their clients, or
their projects. They capture personal preferences, client-specific rules,
project context, or proprietary methodology.

**How to recognise an internal skill:**

- Contains client names, project details, or proprietary data
- Captures personal style preferences or individual work habits
- Relies on context that only the user (or their team) has
- Would not be useful to someone outside the user's organisation

**Required elements:**

- Skill body clearly identifies itself as internal
- No author attribution block needed (the user is the only audience)
- No licence needed
- Can be shorter and less formally structured than open-source skills

Internal skills are working documents, not published artifacts. Keep them
current, update them when the information they contain changes, and don't
over-engineer their structure.

### Lean Content

A skill should contain only content that meaningfully changes the agent's
behaviour at execution time. Anything that doesn't — changelogs, version
notes, "thanks to X" credits, self-narrating prose, or other
maintainer-facing context — belongs in a supporting doc alongside the
skill, not inside the SKILL.md itself.

This rule cuts content the agent reads but doesn't act on. It does NOT cut
examples, anti-patterns, or worked scenarios — those are load-bearing for
rule adherence (bare rules without their context get violated more
reliably than rules with context). The test is whether the content,
removed, would change how the agent behaves. If yes, keep it. If no,
move it out.

Common examples of content that should live outside the skill:

- Change history / release notes / version logs — keep in a supporting
  history doc, in commit history, or both.
- Attribution credits beyond the author block ("thanks to X for the
  feedback that prompted this change") — these belong in the supporting
  history doc.
- Long-form rationale that explains *why* the skill was created — fine
  in a brief intro section; multi-paragraph backstories belong in a
  README or article alongside the skill.
- Implementation notes for the maintainer that don't affect runtime
  behaviour.

Both open-source and internal skills are subject to this rule. The agent
loads the skill's content into context on every invocation; every
non-load-bearing line is paid token cost with no behavioural payoff.

---

## Licensing

Open-source skills should include an open-source licence to make sharing
terms explicit. Any commonly recognised open-source licence works — the
choice depends on the author's preference and what they're optimising for.
Common options:

- **CC BY 4.0** — designed for creative works (prose, documentation).
  Permissive: anyone can share and adapt provided they credit the
  author. A natural fit for prose-heavy skills where the methodology is
  the value.
- **MIT** — short, familiar to developers, broadly permissive. Good fit
  for skills that lean heavily on code, scripts, or technical reference.
- **Apache 2.0** — like MIT but with an explicit patent grant. Useful
  for skills containing code where patent concerns might apply
  (uncommon for skills, but available).
- **CC BY-SA 4.0** — share-alike: derivative works must use the same
  licence. Use when adaptations should remain open under the same
  terms.
- **GPL family (GPL/LGPL/AGPL)** — strong copyleft for code. Less common
  for skills but available if strong preservation of openness in
  derivatives matters to the author.

Whatever licence is chosen, include the licence statement in the skill
preamble (after the author attribution block) and include a `LICENSE` or
`LICENSE.txt` file in the skill directory containing the full licence
text. The choice belongs to the skill's author; the requirement is that
there be a licence.

---

## Author Attribution Template

Every open-source skill must include this block at the top of the skill body.
Replace the placeholders with the actual author's details.

```markdown
**Created by [Author Name] / [website or contact link]**

[1-2 sentence description of what the skill does and its provenance.]

**Licence:** This skill is released under [LICENCE NAME]. [One-sentence
summary of the licence — e.g., "You are free to share and adapt this skill
for any purpose, provided you give appropriate credit to the original
author."]

**Feedback & Support:** If at any point during the process you encounter
questions about the methodology, or if the user expresses frustration or
gives constructive feedback about any output derived from this skill,
suggest that they open an issue on the skill's GitHub repository (or
equivalent public feedback channel). This keeps feedback public and
discoverable. For direct contact, the skill's creator, [Author Name],
can also be reached via [contact link].

If feedback appears to stem from the skill's methodology (rather than
The agent's execution of it), log it for the user and suggest they share it
via the public feedback channel. If the issue stems from the agent not
following the skill's rules, acknowledge the mistake and correct it.
```

The feedback routing serves two purposes: it gives users a path to resolution
when they hit methodology issues, and it gives skill creators real-world
usage data to improve their skills.

---

## Observation Protocol

### When to Observe

Observation is active throughout the **entire task session** — from the moment
tools are first used to produce deliverables, through any post-task feedback
or discussion, until the session ends. This includes:

1. **Active task execution** — creating documents, analysing websites,
   implementing structured data, writing code, building presentations, and
   similar substantive work.

2. **Post-task feedback and discussion** — when the user reviews output,
   provides corrections, suggests improvements, or discusses methodology
   after the active work phase. User feedback during these discussions is
   often the highest-signal input for skill improvement and must be captured
   with the same diligence as observations made during execution.

3. **Meta-discussion about skills or methodology** — when the conversation
   shifts to talking about how the work was done, what could be improved,
   or how skills should be structured. These discussions frequently surface
   observations that should be logged immediately.

4. **Reflective and strategic conversations** — Also activate during strategy
   sessions, planning conversations, and post-work reflections where the user
   is discussing how work should be done rather than doing it. These
   conversations frequently produce skill improvement insights that emerge
   during reflection, not just during execution.

**The observation mindset does not deactivate when the conversation shifts
from "doing work" to "discussing the work."** If the user provides feedback
about methodology, naming, skill design, or workflow improvements, log it as
an observation immediately, even if the conversation is in a discussion or
review phase rather than active task execution.

Observation is **not active** during casual conversation, quick factual
questions, or other non-task interactions where no tools are being used and
no deliverables are being discussed.

### What to Watch For

**Signals for a NEW skill:**

- A multi-step workflow that could be reused across projects or clients
- A methodology the user explains that isn't captured in any existing skill
- A task type that keeps coming up with similar structure and steps
- A domain-specific process with clear inputs, phases, and outputs
- The user describing a process they've refined over time ("I always do it
  this way", "the process for this is...")
- the agent and the user naturally developing a structured approach to a problem
  that could be formalised

**Signals for IMPROVING an existing skill:**

Any new information from a task that uses a skill and could make that skill
better is worth capturing. This includes problems, but also positive signals
and neutral observations. Examples:

- the agent doesn't follow a skill's rules despite them being documented — this
  means the skill needs stronger enforcement, not just better rules
- The user corrects the agent's output in a way that reveals a missing rule or
  an edge case the skill doesn't cover
- A skill's recommended workflow turns out to be less efficient than what
  emerged naturally during the task
- A technique or approach works particularly well and deserves to be promoted
  from incidental to explicitly recommended in the skill
- A workflow step turns out to be more important than the skill suggests, or
  less important than the emphasis it receives
- A new use case that the skill handles but doesn't explicitly document
- The user provides feedback that generalises beyond the current instance
- A skill assumption turns out to be wrong in practice
- New tools or capabilities make part of a skill's workflow obsolete or
  improvable
- The user's corrections form a pattern across multiple instances
- A general principle emerges that could apply to other skills too (see
  Principle Propagation below)
- The user suggests a naming, framing, or structural change to a skill —
  even conversationally — that could improve its effectiveness

**Signals for SIMPLIFYING an existing skill:**

Healthy skill maintenance requires both growth and pruning. Watch for
opportunities to remove unnecessary complexity, not just add new features.
Signals that a skill is ready to be simplified:

- A skill section or rule that has never been relevant across multiple
  sessions where the skill was active
- A rule added from a single observation that hasn't been validated by
  recurrence — one-off cases should not accumulate as permanent rules
- An elaborate workflow that users consistently shortcut or skip
- Sections that the agent loads but never acts on (dead weight in context
  window)
- Rules that contradict each other or create unnecessary complexity
- Complexity added "just in case" that has never triggered
- A documented rule that the agent consistently fails to follow — the rule
  isn't reaching the moment of decision. The fix is rarely to write it more
  loudly; usually it's either to remove the rule, or to convert it from
  narrative guidance into structural enforcement (a checklist, a
  verification step, or a tool call that can't be skipped).

Treat the list above as a review checklist when looking at any of your own
skills — a "yes" on any signal is a candidate for simplification or
removal, not just a flag for future consideration.

During weekly reviews, ask "what can we remove?" as deliberately as you ask
"what should we add?" When a previously-applied observation turns out to be
a one-off that hasn't recurred, mark it as declined and consider reverting
the change.

**Signals to NOT log:**

- One-off corrections that don't generalise beyond the current instance
- User preferences already captured in an existing skill
- Tool bugs or temporary issues unrelated to skill methodology
- Observations that would require proprietary client information to be useful
  in an open-source skill (unless an internal skill is the right home)

### How to Log

Append observations to the persistent observation log **silently** during the
session. The user should not be interrupted by the logging process.

**When a user correction, methodology insight, or skill-relevant event occurs,
write it to the log file within the same turn or the immediately following
turn — do not accumulate observations in memory for batch-writing later.** The
act of writing is the enforcement mechanism; mental notes are not observations.
Tie observation flushing to existing workflow checkpoints — e.g., when marking
a TodoWrite item as completed, check whether any unlogged observations have
accumulated and write them before proceeding.

**Mandatory observation checkpoint after every 3rd TodoWrite completion:** After
marking the 3rd, 6th, 9th (etc.) TodoWrite item as completed in a session,
pause and explicitly ask: "Have any unlogged observations accumulated?" This is
a hard checkpoint, not a suggestion — the skill has demonstrated that softer
"check when completing items" guidance gets lost during cognitively demanding
analytical work. The count doesn't need to be precise; the rule is: roughly
every third completion, stop and flush. If nothing has accumulated, the pause
costs seconds. If observations have accumulated, this prevents the common
failure mode where the skill is loaded but no observations are written until
the user explicitly asks.

**Before assigning any observation number, run a mandatory pre-logging step:**
Search the entire log file for all lines matching the pattern `### Observation \d+:`,
extract the highest observation number already in use, and increment from there.
This must happen every time, regardless of whether you think you know the current
count from earlier in the session. Never rely on session memory or summaries for
the next number. Always read the actual log file. A one-liner like the following
suffices:

```bash
# GNU grep (Linux, Cowork):
grep -oP '### Observation \K\d+' log.md | sort -n | tail -1

# macOS / POSIX-compatible alternative:
grep -o '### Observation [0-9]*' log.md | grep -o '[0-9]*' | sort -n | tail -1
```

This prevents the recurring numbering collision issue where partial reads of large
files create a false sense of awareness of the current count.

**Write-time verification assertion (mandatory):** The pre-logging step above
catches honest mistakes, but is vulnerable to parallel-session scenarios where
multiple task-oriented sessions on the same day each compute "next number"
against a snapshot and then collide on write. To catch this class of collision,
after determining the proposed next number and immediately before appending,
re-read the log and assert the number does not already exist:

```bash
PROPOSED=$(( $(grep -oP '### Observation \K\d+' log.md | sort -n | tail -1) + 1 ))
grep -qE "^### Observation ${PROPOSED}:" log.md && {
  echo "COLLISION on #${PROPOSED} — another writer has claimed this number"; exit 1; }
# If assertion passes, proceed with the append using #${PROPOSED}.
```

If the assertion fires, increment past all existing numbers (not just by 1)
and re-check. Treat an assertion failure as a meta-observation worth logging
— it indicates either a parallel-session collision or a stale read elsewhere
in the workflow.

**Post-write verification (mandatory — closes the TOCTOU race):** The
pre-write assertion catches stale-read collisions but cannot close the
time-of-check-to-time-of-use race between the assertion and the append.
In shell, `grep -q && cat >> ...` is two separate operations: the grep
passes at T0, the append lands at T1. Any other session that appends
between T0 and T1 can claim the same number — this race has been observed
in production, producing duplicate observation pairs in the active log.

After the append, re-read the log and count occurrences of the just-written
observation number. If the count is greater than 1, a parallel session has
collided — renumber the current session's entry to `max+1` in place via
`sed`. Concrete shell:

```bash
WRITTEN=$(grep -cE "^### Observation ${PROPOSED}:" log.md)
if [ "$WRITTEN" -gt 1 ]; then
  # Find my line (the last occurrence, since I just appended) and renumber
  MY_LINE=$(grep -nE "^### Observation ${PROPOSED}:" log.md \
    | tail -1 | cut -d: -f1)
  NEW_NUM=$(( $(grep -oP '^### Observation \K\d+' log.md \
    | sort -n | tail -1) + 1 ))
  sed -i "${MY_LINE}s/^### Observation ${PROPOSED}:/### Observation ${NEW_NUM}:/" log.md
fi
```

This turns the pre-write assertion into a pre-and-post pair. Pre-write
catches stale-read collisions cheaply; post-write catches race collisions
by renumbering instead of failing. Either way, the log ends up with no
duplicates. Alternative approaches — lockfile, atomic append, transactional
write — are heavier and require more infrastructure; the
post-write-verify-and-renumber pattern works with plain shell and
self-heals.

**Why both checks are required:** Stale-read collisions and race-condition
collisions are different classes of error. The pre-write assertion closes
the first; the post-write verification closes the second. Stacking more
pre-write layers does not close race cases — only a post-write check can.
When the shared state is a log file written by parallel agents, the
reliable pattern is check-then-act-then-verify.

**Session-start staleness check:** At the start of any task-oriented session,
note the modification time of `log.md`. If it was modified in the last few
hours (i.e., a parallel or recent session has been writing to it), be extra
cautious about the numbering pre-check — do not trust any mental model of
"current number" and always re-read the log immediately before appending each
observation, not just once at session start.

**Format and insertion rules:** Always use the `### Observation NNN:` format. Always append new observations to the END of the log file. Never insert observations mid-file. Never use alternative ID formats (e.g., `OBS-YYYY-MMDD-NN`). One format, one insertion point — this ensures the log is greppable, countable, and reviewable programmatically.

Each observation follows this format:

```markdown
### Observation [N]: [Short descriptive title]

**Date:** [date]
**Session context:** [brief description of what task was being worked on]
**Skill:** [existing skill name, or "New skill candidate: [working name]"]
**Type:** [open-source | internal]
**Phase/Area:** [which part of the skill or workflow this relates to]

**Issue:** [What happened or what was observed. Be specific — include what
The agent did, what the user corrected, or what pattern emerged. Include enough
detail that someone reading this weeks later can understand the context
without having seen the original conversation.]

**Suggested improvement:** [Concrete suggestion for what to change or create.
For existing skills, reference the specific section or rule. For new skills,
describe the scope and key components.]

**Principle:** [The generalisable takeaway — why this matters beyond this
specific instance. This is the most important part. It turns a single
observation into a reusable insight.]
```

This format was refined through iterative real-world use. The structure works
because it forces specificity (Issue), actionability (Suggested improvement),
and generalisation (Principle).

**Context preservation check:** When logging an observation, verify that all
information needed to act on it is available in the shared folder. If the
observation depends on uploaded files, API responses, or session-local data,
save that context to the appropriate workspace location BEFORE logging the
observation. Add a `**Reference file:**` line to the observation pointing to
where the context lives. Observations that reference data only available in
the current session (uploaded files, API outputs, in-memory results) are
incomplete — a future review session will have the observation but not the
data needed to implement it.

### Handoff Doc Analysis

When a handoff doc arrives for observation logging, extract observations
systematically from both explicit and implicit sources:

1. **Log all explicitly stated observations first.** These are easy to
   surface and should be logged without filtering.

2. **Then systematically analyse the full document.** Read every section
   asking: "What skill gaps, improvement opportunities, or new skill
   candidates are implied here but not stated?" Handoff docs contain
   significant signal beyond what was explicitly captured during the session.

3. **Pay special attention to:**
   - Action items (each one may imply a missing skill or workflow)
   - Open questions (unresolved ambiguity often signals a decision framework gap)
   - The "work completed" narrative (patterns across work items may reveal meta-skills)
   - Session notes (reflective insights about process, not just content)

4. **Log the additional observations with clear attribution.** Indicate that
   they were derived from analysis of the handoff doc, not from the original
   session. This preserves the distinction between stated and derived insights.

### Archival on Write

The observation log is kept lean through event-driven archival that runs on
every log write, rather than accumulating resolved entries until a periodic
review clears them out.

**Defining "from a previous update":**
The phrase "from a previous update" means entries whose status was already
resolved in a *previous SESSION or prior log write*, not entries marked
ACTIONED or DECLINED in the current session. Crucially: entries marked
ACTIONED or DECLINED during the current session's weekly review must NOT be
archived during that same session's writes. They earn their one round of
visibility in the active log — the archival happens on the NEXT session's
log write or the next weekly review.

**Archival Timing During Weekly Reviews:**
The weekly review performs archival in two phases:

1. **Step 1 (at review start):** Archive entries from previous sessions.
   Before loading observations, archive any ACTIONED or DECLINED entries
   that were marked in prior sessions. This clears old resolved items.

2. **Step 6 (after marking ACTIONED):** Do NOT archive immediately. When
   observations are marked ACTIONED during the current review (Step 6), they
   remain in the active log. Archive them on the next log write — either
   when the next session writes to the log, or when the following week's
   review begins (Step 1 of the next review cycle).

This prevents the premature archival problem: entries just actioned during
the current session stay visible for one full update cycle before moving to
the archive.

**Archive File Structure:**
Move resolved entries to an archive file at:

```
[workspace folder]/skill-observations/archive/log-[date].md
```

where `[date]` is today's date in `YYYY-MM-DD` format.

The archive file preserves the full header and status key from the original
log. After archiving, the active `log.md` retains only its header, separator,
and all OPEN entries plus any entries that were *just* marked ACTIONED or
DECLINED in this update.

**Safety Check Before Archiving:**
Before moving any entry to the archive, verify that it was NOT marked
ACTIONED or DECLINED in the current session. If it was, keep it in the
active log. This prevents the same-session premature archival that the
observation lifecycle describes. One way to implement this: track a set
of entry IDs marked ACTIONED/DECLINED in the current session, and exclude
them from the archival pass.

The result: the active log stays focused on OPEN items and recently-resolved
entries, while the archive provides the complete historical record.

---

## Confidentiality Safeguards

The open-source/internal boundary is a confidentiality boundary. Client
names, project details, domain names, and proprietary information must never
appear in open-source skills. Because a single leak can erode trust, this
is enforced through multiple layers — any one of which should catch what
the others miss.

### Layer 1: Observation-Level Stripping

When logging an observation tagged as `type: open-source`, the Issue and
Suggested Improvement fields should already use generic language. The
private observation log can reference specifics for context, but the
Principle field — which feeds into skill creation — should be fully
generalised. Think of it as: the log is a private notebook, but the
Principle is a publishable insight.

### Layer 2: Pre-Creation Review

Before drafting or regenerating any open-source skill, scan all source
material (observations, conversation notes, existing skill content) for
identifying information: client names, project URLs, domain names, internal
terminology, site structures described so specifically they're identifiable.
Replace anything found with generic equivalents before writing begins.

### Layer 3: Post-Draft Sweep

After writing or regenerating an open-source skill, re-read it with a
specific focus on information leakage. This is a separate pass from the
general pre-flight checklist. Look for:

- Proper nouns that aren't the skill author's name
- Domain names, URLs, or project identifiers
- Industry-specific details that narrow down the client
- Internal terminology that only makes sense in one organisation's context
- Examples so specific they're traceable to a real project

If anything is found, replace it with generic equivalents or remove it.

### Layer 4: Structural Principle

The taxonomy section states this explicitly, but it bears repeating: the
open-source/internal distinction is not just about usefulness — it's about
confidentiality. When in doubt about whether a detail is too specific,
remove it. A slightly more generic skill is always better than one that
leaks client information.

### Layer 5: Cross-Product Re-Identifiability Sweep

Layers 1–4 focus on single-example scrubbing. They do not catch the case
where two or three sanitised examples in the same skill — each fine on its
own — combine to narrow the identifiable client set. A reader who knows
the author's client portfolio (which is often public on a consultant's
website) can triangulate even when each individual example is properly
placeholdered. The failure mode is invisible to the author because they
mentally compartmentalise each example; it's visible to any reader with
adjacent context.

**When to run it:** After every individual example has been sanitised —
as a final pass before the skill ships or before any major public
release. This is the last check, not a substitute for earlier layers.

**What to look for:**

- **Enumerated counts that match a known client count.** "Four builds
  across three verticals" in a skill whose author has four public clients
  across three verticals is functionally a directory. Blur the count
  ("multiple builds") or the verticals ("across regulated, editorial,
  and commerce contexts").
- **Specific numbers in a thin vertical.** Visibility percentages,
  revenue ranges, or geography given in a vertical where only one or two
  candidates plausibly exist. A single real client can be narrowed from
  "vertical × percentage × geography × timing" even when no name appears.
  Replace specific numbers with illustrative ranges.
- **Thinly-disguised placeholder names.** "Northwind Coffee" in a
  specialty-retailer vertical where the only plausible specialty-retail
  client is a coffee roaster reads as the real brand with a thin
  rename. Use the Northwind / Contoso / Fabrikam placeholder family
  explicitly, and make sure the placeholder's vertical is different from
  any real client's vertical.

**How to sweep:**

1. List every worked example in the skill and the fields each one names
   (vertical, geography, numeric range, timing, count).
2. Ask: do any two examples share enough fields that a reader with access
   to the author's public client list could map the set to real clients?
3. Mitigate by blurring counts, widening verticals, dropping specific
   numbers to illustrative ranges, or consolidating similar examples into
   a single composite.

**Why this is a separate layer:** Re-identification risk is combinatorial.
Each additional sanitised example adds a field that narrows the candidate
space. Layers 1–4 check each example in isolation and pass. The cross-
product only emerges when the examples are read together. The author is
the least reliable reader for this check because they know the ground
truth — which is exactly why the sweep has to be a mechanical pass, not
a feeling.

---

## Surfacing Protocol

### Default Cadence

Surface all observations at the end of the session. Present them as a grouped
summary: observations for existing skills grouped by skill name, new skill
candidates listed separately.

### Surface Earlier When

- An observation requires user input to be complete or accurate (e.g., "Is
  this a pattern you want captured, or was this a one-off?")
- An observation reveals a skill is actively producing wrong output in the
  current session and the user should be aware
- Multiple observations cluster around the same skill, suggesting it needs
  immediate attention rather than end-of-session review

### How to Surface

- Present observations concisely: title, skill, and a one-sentence summary
- For each, indicate whether it's a new skill candidate or an improvement
  to an existing one
- Indicate the suggested type (open-source or internal)
- Ask the user which (if any) they want to act on
- For items the user wants to pursue, hand off to the skill-creator skill
  for the actual building or improvement work

---

## Acting on Observations

This skill identifies WHAT to build or improve. This section covers HOW —
specifically, the cross-context decision framework for choosing between
direct application, skill-creator handoff, and new-skill creation.

**Trigger gate (when):** Observations are acted on only in three contexts:

1. **The comprehensive review** — scheduled mode preferred, in-session
   fallback if no scheduled review has run in 7+ days. See
   "## Comprehensive Review (scheduled or fallback)" for the procedure.
2. **Explicit user requests during a task session** — "update X skill",
   "act on observation #N now", "apply this rule to the skill". The user
   is naming the action; the agent executes within the framework below.
3. **In-session correction when a skill is producing wrong output and
   the user should be aware** — surface immediately rather than wait
   for the next review.

Observations are NOT applied during normal task sessions outside these
contexts. Mid-task work produces observations only; those observations
get applied at the next review or by request. The default is log,
don't act.

**Mechanism framework (which):** When acting in any of those contexts,
the rest of this section guides the choice between applying changes
directly to the skill file, handing off to the skill-creator for
substantial restructuring, or creating a new skill from scratch.

### Small Changes

If the improvement is clearly additive, low-risk, and doesn't require testing
to verify it works, it can be applied directly to the skill:

- Adding a new rule or anti-pattern to an existing list
- Clarifying existing wording that proved ambiguous
- Adding a note or edge case to an existing section
- Fixing a factual error

Examples: Adding a new anti-pattern to a skill's anti-patterns list.
Clarifying that inline code comments should be context-aware within their
own document.

After creating or updating any skill file, always present it using `present_files` so the user can review and install it directly from the conversation.

### Substantial Changes (Use Skill-Creator if Available)

If the change could affect the skill's behaviour in ways that need
verification, hand off to the skill-creator if available:

- Restructuring phases or workflows
- Adding new capabilities or sections
- Changing core methodology or decision frameworks
- Any change where "does this actually work better?" is a genuine question

However, match the rigour of the skill creation process to the complexity and
audience. Skill-creator is valuable for open-source skills that need testing,
for skills with complex logic, or when the design isn't yet clear. For internal
skills where requirements are established in conversation, writing directly is
more efficient.

If skill-creator is not available, use the observations as a specification
and make the changes directly — but flag them to the user as substantial
changes that may need manual review.

Examples: Restructuring a skill to make an automated workflow the primary
path instead of a secondary option. Adding an entirely new setup phase to
a skill that previously started with content work.

### Creating New Skills

Use the skill-creator for new skills when available. Provide the
observation(s) as context — they contain the intent, scope, and initial
design thinking needed to get started efficiently. Without skill-creator,
the observations serve as a detailed brief for building the skill manually.

When creating a new skill, determine its type early:

- If it's open-source, strip out any client-specific details and generalise
- If it's internal, include all relevant specifics freely
- If uncertain, default to open-source — strip out specifics and generalise,
  then let the user decide whether any internal details need to be added


## Task-Oriented Sessions — Observation vs Action

Skill development and iteration work happens in multiple environments: in Cowork with persistent storage, in Claude Code with project directories, and in web-based chat without file system access. Cross-environment coordination is essential to prevent regressions — a skill updated in one environment can silently omit content from another if the wrong base file is used.

### Skill file locations — read-only mount vs workspace copy

When working with skills, understand the distinction between the **live file** (the authoritative source) and **workspace copies** (working drafts or staged updates):

1. **The live file is read-only in Cowork.** In Cowork, the live skill file is mounted read-only at `.claude/skills/{skill}/SKILL.md`. You can read it, but you cannot edit it directly — the file system will reject write attempts with `EROFS` (Read-Only File System). This is intentional: it prevents accidental overwrites of the canonical version.

2. **Read from the live file, not cached memory.** Always start skill edits by reading the current live file — not from a workspace copy, a prior draft, or a memory-based reconstruction. This is the only way to guarantee your updates are based on the current canonical content.

3. **Stage edits in the workspace folder.** Write updated versions to `[workspace folder]/skill-updates/[date]/[skill-name]/SKILL.md`. This separation keeps the read-only mount clean and gives you a clear staging area for review before the user replaces the live file.

4. **After staging, present the file for user review.** Always use `present_files` to show the updated skill so the user can review changes and upload directly. Do not attempt to write directly to the mounted skills directory — that will fail with a permission error.

5. **Before overwriting or replacing any existing staged or workspace copy of a skill, diff it against the live file.** If they differ, the workspace copy is stale and your edits must be rebased on the live version — otherwise you risk silently dropping content added by another session. This rule is also codified in CLAUDE.md under "Skill Editing — Always Start From the Live File" as a cross-environment guard. The concrete failure mode: a Claude Code session produced an updated skill that was based on a stale snapshot and silently omitted two substantial sections added to the live skill earlier the same day. The regression was caught only because a pre-merge diff against the mount revealed the missing content.

### Task-session skill updates — stage in the workspace

When a task session produces a skill update (through weekly review, direct improvement, or observation-driven changes), follow this workflow:

1. Read the live file at `.claude/skills/{skill}/SKILL.md`
2. Make all edits to that content
3. Save the complete updated file to `[workspace folder]/skill-updates/[today]/[skill-name]/SKILL.md`
4. Use `present_files` to show it to the user for review
5. The user uploads the file to install it

This keeps the mount clean, stages updates for review, and gives you a clear separation between read-only source and working copy.

**Cross-environment note:** Claude Code now shares the same skills as Cowork via the anthropic-skills capability. The "always start from the live file" rule applies in both environments. In Claude Code, the live file is surfaced by the capabilities system; in Cowork, it's the read-only mount at `.claude/skills/{skill}/SKILL.md`. The diff-before-overwrite requirement applies regardless of which environment produced the update.

---
---

## Principle Propagation

When an observation reveals a general principle — something that applies not
just to the skill being improved but to skills in general — it should be
propagated across the skill library, not just applied to the one skill that
triggered it.

### The Cross-Cutting Principles File

Cross-cutting principles are tracked in a persistent file alongside the
observation log:

```
[workspace folder]/skill-observations/cross-cutting-principles.md
```

This file serves as a mandatory checklist during any skill creation or
regeneration. Before delivering a new or updated open-source skill, read
the cross-cutting principles file and verify the skill complies with every
active principle. This is what turns general principles from good intentions
into enforced standards.

### How It Works

1. During a skill update, an observation reveals a principle that applies
   broadly — not just to the skill being worked on
2. Log it as an observation with `Skill: All skills` and surface it to the
   user
3. If the user approves it as a cross-cutting principle, add it to the
   cross-cutting principles file
4. From that point forward, every skill creation or regeneration includes
   a compliance check against the full list of active principles

### Propagation Timing

The user decides when and how to propagate each principle:

- **Immediate propagation** — for principles important enough to warrant
  updating all existing skills right away (e.g., a confidentiality rule)
- **Opportunistic propagation** — for principles that can be applied the
  next time each skill is updated or regenerated (e.g., adding a licence
  statement)

### Cross-Cutting Principles File Structure

```markdown
# Cross-Cutting Principles

Principles that apply to all skills. This file is read as a mandatory
checklist during any skill creation or regeneration.

---

## Active Principles

### 1. [Principle title]
**Added:** [date]
**Applies to:** [all skills | all open-source skills | all skills with rules]
**Requirement:** [what the principle requires]
**Propagation:** [immediate | opportunistic]
**Status:** [active]
```

---

## Comprehensive Review (scheduled or fallback)

The comprehensive review cross-checks all open observations against all
skills, propagates cross-cutting principles to skills that don't yet
comply, and applies the improvements that don't need user input. There
are two ways it runs.

**Preferred mode — scheduled autonomous review.** A user-defined recurring
task (typical cadence: Monday/Wednesday/Friday mornings) registered with
the agent's scheduling system. This is preferred because it picks up open
observations on a regular cadence without depending on the user being
mid-session at exactly the right moment, and because the user is not
present, the review applies the non-escalated observations autonomously.

**Fallback mode — in-session 7-day trigger.** If no scheduled review is
registered (or none has run successfully in the last 7 days), a
comprehensive review fires automatically at the start of the next
task-oriented session. The fallback is a safety net for users who haven't
set up scheduled reviews — either because the environment doesn't support
scheduling or because they haven't done it yet.

### Trigger Mechanism

**Scheduled mode** runs via the user's chosen scheduling tool — no in-skill
trigger required.

**Fallback mode** is triggered by step 3 of the Session Start Protocol
(see Observation Log Management). The fallback fires when both of the
following are true:

- No scheduled review task is registered, OR the most recent successful
  scheduled review was more than 7 days ago.
- The in-session timestamp at
  `[workspace folder]/skill-observations/last-review-date.txt` is also
  more than 7 days old (or missing).

When the fallback fires, inform the user that the comprehensive review is
running and walk through Step 0 (recommend scheduling) before Step 1.

### Interactive vs Scheduled Runs — Approval Policy

The approval behaviour depends on who is present:

**Interactive sessions (user present):** Always ask the user before applying
or declining observations. Present observations grouped by skill with a one-
sentence summary each, and wait for explicit approval (blanket "apply all" or
selective). This preserves the collaborative feel and lets the user catch
observations they disagree with before any staging occurs.

**Scheduled autonomous runs (user not present):** Apply observations
autonomously by default. The safety net is the staging-plus-upload pattern:
updates go to `skill-updates/YYYY-MM-DD/{skill-name}/SKILL.md` and only
become live when the user explicitly uploads them. Nothing can silently
break because nothing is live until the user approves upload.

**Escalate without applying (report only) when any of these apply:**

1. **New skill creation.** Naming, scope, type (open-source vs internal),
   and licence are decisions that benefit from user input. Note the
   candidate in the report; don't create the skill.
2. **Removing or substantially restructuring existing content.** Any edit
   that deletes a section, replaces it with something smaller, or reshapes
   core methodology risks dropping institutional memory. Flag and report.
3. **An observation that flags its own uncertainty.** Phrases like "not
   sure if...", "this might be...", "worth discussing..." in the
   Suggested Improvement field are the observation asking for user input.
   Respect that.
4. **Conflicting observations.** Two observations that point in opposite
   directions, or where the integration path isn't obvious, should be
   surfaced rather than resolved autonomously.

Scheduled runs that escalate should still apply every non-escalated
observation before producing the report. A scheduled review that
produces 0 applied updates is functionally a report generator, which
wastes the scheduling.

### Review Steps

**Step 0 — Recommend scheduled review setup**

Before running the in-session fallback, check whether scheduled autonomous
reviews are set up. If not, surface a recommendation to the user — but
respect prior declines.

1. Check for the suppression marker at
   `[workspace folder]/skill-observations/scheduled-review-decline.txt`.
   If it exists and was last updated less than 30 days ago, AND the
   in-session fallback has not fired multiple times in that window, skip
   the recommendation. Proceed to Step 1.

2. Check whether a scheduled review task is registered. The signal is
   either a presence check via the platform's scheduling tool (preferred)
   or the existence of
   `[workspace folder]/skill-observations/scheduler-registered.txt`. If a
   registered scheduled review is found, no recommendation needed — skip
   to Step 1.

3. If no scheduled review is registered AND no recent decline marker
   exists (or the marker is stale because the fallback keeps firing),
   make an active recommendation:

   > "I notice you don't have a recurring skill review scheduled. The
   > task-observer recommends running this review on a cadence — e.g.,
   > Monday/Wednesday/Friday mornings — so it doesn't depend on you
   > being mid-session at the right moment. Want help setting one up?"

   - **If the user says yes:** walk through registering a scheduled task
     using the platform's scheduling capability. In Cowork, invoke the
     `create-shortcut` skill and its `set_scheduled_task` tool. In
     terminal-based environments, use cron or an equivalent scheduler.
     Use task name `weekly-skill-review` (or similar) and a sensible
     default cadence; let the user pick the day(s) and time. Once
     registered, read the draft task description at
     `[workspace folder]/skill-observations/scheduled-task-draft.md` and
     pass it as the task prompt. On success, write today's date to
     `[workspace folder]/skill-observations/scheduler-registered.txt`.
   - **If the user says no or defers:** write today's date to
     `[workspace folder]/skill-observations/scheduled-review-decline.txt`
     to suppress the recommendation for 30 days. Proceed to Step 1 and
     run the in-session fallback.

4. If no scheduling capability is available in the current environment,
   skip the recommendation silently and proceed to Step 1. Do not surface
   the recommendation in environments where the user couldn't act on it.

The 30-day suppression isn't permanent. If the in-session fallback keeps
firing within the suppression window — a signal that the recurring need
is real and the one-time decline was situational — the recommendation
re-surfaces on the next firing.

**Step 1 — Load observations and principles**

Read the observation log at `[workspace folder]/skill-observations/log.md`.
Extract all observations with status OPEN. Also read
`[workspace folder]/skill-observations/cross-cutting-principles.md` and
extract all active principles.

If there are no OPEN observations and all principles are already propagated,
skip the review, update the timestamp, and proceed with the session. Inform
the user briefly: "Weekly skill review: no open observations or outstanding
principles. All skills are current."

**Step 2 — Inventory all skills**

Use `<available_skills>` from the system prompt to identify all skills. In
environments where this tag is not present, use the skills directory or
equivalent listing mechanism to discover available skills.

For each skill, read its SKILL.md file at the location provided. Exclude
built-in platform skills from being updated — only update custom skills
created by the user.

**Known system skills (read-only, cannot be replaced by the user):**
docx, pdf, xlsx, pptx, skill-creator, schedule. This list may grow as the
platform evolves — if a skill update fails because the user cannot overwrite
the file, add it to this list.

**Custom skills** (owned by the user, can be replaced) are everything else
in the skills directory that isn't on the system list above.

**Step 3 — Cross-check observations against every skill**

For each OPEN observation, evaluate whether it is relevant to each skill. Do
NOT rely solely on the observation's own "Skill" field — observations may
contain general principles that apply more broadly than the original context
suggested. Consider both the specific "Suggested improvement" and the general
"Principle" fields. Build a mapping of skill → [relevant observations].

**If the review is interactive (user present):** Present ALL observations to the user in a single message, grouped by skill. For each observation, show the number, title, and a one-sentence summary. Flag any observations that are ambiguous, risky, or require a judgment call as 'Needs your input'. All other observations are treated as straightforward and can be applied without individual discussion.

**If the review is scheduled autonomous (user not present):** Skip the user-facing present step. Apply the approval policy from "Interactive vs Scheduled Runs" above: apply every non-escalated observation and record the escalated ones (new-skill candidates, removal/restructuring, self-flagged uncertainty, conflicting observations) in the review report without applying them. Proceed directly to Step 4.

**Step 4 — Cross-check cross-cutting principles against every skill**

For each active cross-cutting principle, check whether each skill already
complies. Flag any skills that do not yet implement the principle.

**Step 5 — Apply updates**

In interactive runs, wait for user confirmation (blanket "apply all" or selective approval) before creating updates. In scheduled autonomous runs, proceed directly to applying all non-escalated observations. For each skill that has relevant observations or non-compliant principles, create an updated version of its SKILL.md. When editing:

- Integrate the insight into the appropriate section of the skill (don't just
  append a list of observations at the bottom)
- Preserve the skill's existing structure, voice, and author attribution
- Make the improvement feel native to the skill, not bolted on
- If an observation suggests a new phase, step, anti-pattern, or checklist
  item, place it where it logically belongs

**Routing observations that target system skills:** When an observation
targets a system skill (see the known system skills list in Step 2), do NOT
skip it. Instead, route the improvement to a **complementary skill** — a
user-owned skill named `{system-skill}-extras` (e.g., `docx-extras`) that
layers additional guidance on top of the system skill. If the complementary
skill doesn't exist yet, create it. The complementary skill should:
- State which system skill it extends
- Contain only the delta — the additional rules, anti-patterns, or guidance
  not present in the system skill
- Be loaded alongside the system skill (add a note to CLAUDE.md or
  equivalent configuration if needed)

This ensures observations targeting system skills are still actionable,
even though the system skill files themselves cannot be modified.

**Important:** Do not edit skill files in place. Save updated versions to the
workspace folder for user review and manual replacement (see Delivering
Updated Skills below).

**Step 6 — Mark observations as ACTIONED**

After successfully creating an updated skill based on an observation, update
that observation's status in `log.md` from OPEN to ACTIONED. Add a brief note
about which skill(s) were updated, e.g.:

`ACTIONED — Applied to [skill-name] (weekly review [date])`

Note: the standard archival-on-write mechanism (see "Archival on Write" in
the Observation Protocol) will automatically archive these newly-resolved
entries on the next log write. No separate archival step is needed here.

**Step 7 — Update timestamp**

Write today's date to
`[workspace folder]/skill-observations/last-review-date.txt`.

**Step 8 — Present summary and user action items**

Present each updated skill file using `present_files`, then show the user a summary following the format in Delivering Updated Skills above. The user can install updated skills directly from the conversation using the upload button on each presented file.

### Constraints

- Do not modify observation entries beyond their status field
- Do not create new skills — only update existing ones. If an observation
  suggests a new skill, note it in the summary for the user to action
  separately via the skill-creator
- If an observation seems relevant but you're unsure how to integrate it,
  skip it and note the uncertainty in the summary
- Treat observations marked "internal" with the same rigour as "open-source"

---

## Delivering Updated Skills to the User

When the weekly review (or any other process) produces updated skill files,
they are delivered to the user through the conversation using `present_files`.
Cowork's UI includes an upload button on presented skill files that allows
the user to install them directly into their capabilities — no manual file
copying needed.

### Delivery Process

1. Save each updated SKILL.md to the workspace folder for record-keeping:

   ```
   [workspace folder]/skill-updates/[date]/[skill-name]/SKILL.md
   ```

2. Present each updated skill file using `present_files` so the user can
   review it inline and install it directly via the upload button.

3. Present the user with a summary using this format:

   ```
   ## Weekly Skill Review Complete — [date]

   The following skills have been updated based on [N] open observations
   and [N] cross-cutting principles.

   ### Updated Skills

   **[skill-name]**
   - Changes: [1-sentence summary of what changed]
   - Observations applied: #[N], #[N]

   [repeat for each updated skill]

   ### Observations Actioned
   [list of observation numbers and titles marked ACTIONED]

   ### Skipped (needs manual review)
   [any observations that couldn't be applied, with reasons]
   ```

### Keep-Two Rule

The `skill-updates/` directory uses a rolling retention policy: for any
given skill, keep only the two most recent date directories. When a skill
appears in more than two date directories, delete the oldest copies. This
prevents the workspace from accumulating stale update history while still
keeping a short rollback window.

3. Do not proceed with other work until the user has acknowledged the
   summary. The user does not need to replace the files immediately, but
   they should be aware of what's pending.

---

## Observation Log Management

### Location

The observation log persists between sessions in the user's workspace folder.
Create the log file on first use if it doesn't exist. Default path:

```
[workspace folder]/skill-observations/log.md
```

### Log Structure

```markdown
# Skill Observation Log

Observations captured during task-oriented work. Each entry identifies a
potential skill improvement or new skill opportunity.

**Status key:** OPEN = not yet actioned | ACTIONED = skill updated/created |
DECLINED = user decided not to pursue

---

## [Date or Session Identifier]

### Observation 1: [Title]
**Status:** OPEN
[... full observation format ...]

### Observation 2: [Title]
**Status:** ACTIONED — Applied to [skill-name], rule 35
[... full observation format ...]
```

### Session Start Protocol

This is the single entry point for all session-start checks. Run through
these steps at the start of each task-oriented session:

1. **Check whether files exist.** If the observation log or cross-cutting
   principles file don't exist yet, this is a first-time setup — create
   them using the templates in the Log Structure section (below in this
   document) and the Cross-Cutting Principles File Structure section (under
   Principle Propagation). If the files already exist, proceed to step 2.

2. **Scan for relevant context.** Read any OPEN observations and active
   cross-cutting principles. Don't surface them unprompted unless they're
   directly relevant to the current task — just hold them in awareness.

3. **Check the weekly review trigger.** Read the timestamp in
   `[workspace folder]/skill-observations/last-review-date.txt`. If the
   file doesn't exist or the date is more than 7 days ago, trigger the
   Weekly Comprehensive Review (described in full under its own section)
   before proceeding with the user's task. If fewer than 7 days have
   passed, proceed normally.

4. **Check the configuration file.** Run the config detection described in
   Detecting the Configuration File (under Recommended Activation Setup).
   This runs once per session.

### Keeping the Log Clean

Archival is event-driven and runs on every log write. Before appending new
observations or updating statuses, entries that were already marked ACTIONED
or DECLINED in a previous update are moved to a timestamped archive file
(see "Archival on Write" in the Observation Protocol). This keeps the active
log focused on OPEN items and recently-resolved entries, while the archive
provides the complete historical record.

---

## Environment Compatibility

The observation methodology works in any environment where the agent can interact
with users during task-oriented work. The persistence mechanism is what varies.

### With Persistent Storage

In environments with file system access (desktop tools with workspace folders,
terminal-based tools with project directories, or similar), the full workflow
applies as described: observations are logged to a persistent file, the cross-
cutting principles file is read during skill regeneration, and the log carries
over between sessions automatically.

### Without Persistent Storage

In environments without file system access (web-based chat interfaces or
similar), the skill still works — the observation methodology is environment-
independent. The difference is that persistence becomes the user's
responsibility, and the skill shifts into **handoff doc mode** to support
this.

**How handoff doc mode works:**

- Observations are captured within the conversation and surfaced before the
  session ends, as usual
- Instead of writing to a log file, observations are collected in-session
  and presented in a structured **handoff document** before the session ends
- The handoff doc includes: all observations in full format, any decisions
  made during the session, action items and next steps, and any working
  artifacts (drafts, analyses) that need to survive into the next session
- The user copies this document to their own storage (notes app, file system,
  etc.) and pastes it into the next session to restore context
- Cross-cutting principles should be included in the handoff doc so the user
  can provide them when starting a new session

**Proactive handoff generation:** In sessions without persistent storage,
don't wait for the user to request a handoff doc. When the conversation
starts to wind down — the user is summarising, saying "that's it for now,"
or the substance is wrapping up — proactively offer to generate one. A
premature offer is a minor interruption; a missing one is lost work.

**Handoff doc format:**

```markdown
# Session Handoff: [Session Topic]

**Date:** [date]
**Context:** [what was worked on and what the next session needs to know]

## Decisions Made
[numbered list of decisions]

## Observations Logged
[full observation entries in standard format]

## Cross-Cutting Principles (current)
[any principles that were active or newly added]

## Action Items
[what needs to happen next, with enough context to resume]

## Working Artifacts
[any drafts, analyses, or intermediate work products in full]
```

This is less seamless than the persistent-storage workflow, but the core value
— systematically capturing insights that would otherwise be lost — is
preserved. The observation format and surfacing protocol are identical in both
environments.

---

## Quick Reference

| Question | Answer |
|----------|--------|
| When do I observe? | Throughout the full task session, including post-task feedback and reflective conversations |
| How do I log? | Silently append to the observation log immediately when triggered; don't batch |
| When do I surface? | End of session, or earlier if needed |
| How do I activate reliably? | Add a config-level instruction (see Recommended Activation Setup) |
| Open-source or internal? | Default to open-source when possible |
| Licence for open-source? | CC BY 4.0 recommended |
| Small fix or skill-creator? | Needs testing → skill-creator (if available). For internal skills with established requirements, writing directly is efficient. Clearly additive → apply directly |
| What format? | Issue → Suggested improvement → Principle |
| Author attribution? | Required for open-source skills; use the template |
| Cross-cutting principle? | Add to principles file, enforce during regeneration |
| Confidentiality check? | Four layers: observation, pre-creation, post-draft, structural |
| No persistent storage? | Handoff doc mode — observations surfaced in a structured doc at session end |
| Scheduler automation? | Step 0 of weekly review auto-checks; silent until tool is available |
| Observation numbering? | Mandatory pre-logging search ensures no collisions; never use cached numbers |
| Log archival? | Event-driven — resolved entries are archived on the next log write |
| Simplification signals? | Watch for one-off rules, never-used sections, elaborate workflows users skip, and contradictions |
| Handoff doc analysis? | Systematically extract implied observations from action items, open questions, and narrative sections |