Skills & Progressive Disclosure

An agent definition is loaded on every invocation, whether the task needs it or not. Skills split the knowledge so only the slice the task requires ever enters the window.

Why this, for you: Lesson 2 fixed the instruction file — index, don't inline. This lesson applies the same move to an agent's own how-to knowledge: keep the definition to identity and scope, push procedures into on-demand skills. The payoff compounds hardest exactly where Lesson 4's fan-out spends the most — every spawned sub-agent that inherits a leaner definition.

A monolithic agent definition embeds every checklist and procedure it might ever need, and pays for all of them on every run. Progressive disclosure structures the definition in two layers so irrelevant knowledge never enters the context window in the first place.

1 Two layers: definition vs. skills

The split is the whole pattern. The definition is always loaded; skills load only when a task calls for them.

Layer	What it holds	Loaded
Definition	Identity, scope, quality bar, and skill references — typically under 50 lines	Every invocation
Skills	Step-by-step procedures, checklists, templates, tool-specific rules	On demand, per task

An agent drafting a blog post does not need its code-review checklist; an agent running a deployment does not need its content style guide. A monolithic definition loads both regardless. The skill version loads the definition, then reads only the skill the current task needs.

2 The context-budget arithmetic

This isn't a tidiness argument — it's a token-budget one, and the numbers are concrete.

A monolithic 2000-token definition loads 2000 tokens every time. Split into a 200-token definition plus five 400-token skills, a task needing two skills loads 200 + 400 + 400 = 1000 tokens — half the baseline, same knowledge available. A CI agent's lint-only run drops from 1800 tokens to 470.

And the savings compound across fan-out: every sub-agent (Lesson 4) that inherits a bloated definition multiplies the waste across the whole fan-out. Trimming the definition once trims it for every worker the orchestrator ever spawns.

# the definition is a skill index, not a manual You are the CI review agent. # Scope: running automated checks on pull requests. Skills available: lint-check # ESLint run + PR comment formatting security-scan # Trivy scan, severity filtering license-audit # license verification against policy # read ONLY the skill matching the requested check

3 Why it works — and the portable standard

The mechanism is the same attention argument behind altitude in Lesson 2. A 2000-token definition forces the model's attention to spread across all 2000 tokens, including the ~80% irrelevant to this task — attention dilution, where critical instructions compete with noise. Worse, irrelevant rules can trigger instruction interference: the model enters self-reconciliation mode over rules that don't apply, producing hedged output. Smaller, focused contexts remove both failure modes.

Skills are portable. The Agent Skills standard formalizes the pattern with a SKILL.md entrypoint supported across Claude Code, GitHub Copilot, Cursor, and others — so the same skill files work regardless of which harness loads them. Skills live in .claude/skills/ (or .github/skills/) as separate files, never embedded.

Where progressive disclosure backfires

The split adds its own failure modes. Skill-index rot: if a skill file is renamed or deleted but the definition still lists it, the agent tries to load a non-existent skill and falls back to guessing — the index must stay in sync with the filesystem. Wrong skill loaded: agents pick skills by their own judgment, so ambiguous tasks or poorly-named skills route to the wrong procedure. Self-contained violations: a skill that implicitly depends on another being loaded first produces inconsistent output. The pattern pays when tasks are clearly scoped and skills are genuinely orthogonal; it degrades when the task space is broad and overlapping.

↪ Your win: load knowledge on demand, not upfront

Keep the definition under ~50 lines — identity, scope, quality bar, and skill references only.
Push procedures into skills — checklists, templates, and tool-specific steps load per task.
Make each skill self-contained — no implicit "load X first" ordering the agent may not follow.
Keep the skill index in sync with the filesystem — a renamed file becomes a broken load otherwise.
Reach for the standard — a portable SKILL.md works across tools and multiplies savings across fan-out.

Retrieval practice — recall, don't peek

Question 1In progressive disclosure, the always-loaded definition should contain…

Question 2Splitting a 2000-token definition into a small definition plus on-demand skills…

Question 3The token savings from skills compound most across…

Question 4A skill file renamed without updating the index produces…

Question 5 · spaced recall from Lesson 07For a task that outlives one session, durability comes from…

Ask me anything. Want to refactor one of your bloated agent definitions into a definition + skills, or see how progressive disclosure interacts with Lesson 2's altitude lever? Next in Part 4: Commands vs Agents — separating the workflow from the expert who runs it.