Harness Engineering · ~7 min
An agent definition is loaded on every invocation, whether the task needs it or not. Skills split the knowledge so only the slice the task requires ever enters the window.
A monolithic agent definition embeds every checklist and procedure it might ever need, and pays for all of them on every run. Progressive disclosure structures the definition in two layers so irrelevant knowledge never enters the context window in the first place.
The split is the whole pattern. The definition is always loaded; skills load only when a task calls for them.
| Layer | What it holds | Loaded |
|---|---|---|
| Definition | Identity, scope, quality bar, and skill references — typically under 50 lines | Every invocation |
| Skills | Step-by-step procedures, checklists, templates, tool-specific rules | On demand, per task |
An agent drafting a blog post does not need its code-review checklist; an agent running a deployment does not need its content style guide. A monolithic definition loads both regardless. The skill version loads the definition, then reads only the skill the current task needs.
This isn't a tidiness argument — it's a token-budget one, and the numbers are concrete.
And the savings compound across fan-out: every sub-agent (Lesson 4) that inherits a bloated definition multiplies the waste across the whole fan-out. Trimming the definition once trims it for every worker the orchestrator ever spawns.
The mechanism is the same attention argument behind altitude in Lesson 2. A 2000-token definition forces the model's attention to spread across all 2000 tokens, including the ~80% irrelevant to this task — attention dilution, where critical instructions compete with noise. Worse, irrelevant rules can trigger instruction interference: the model enters self-reconciliation mode over rules that don't apply, producing hedged output. Smaller, focused contexts remove both failure modes.
Skills are portable. The Agent Skills standard formalizes the pattern with a SKILL.md
entrypoint supported across Claude Code, GitHub Copilot, Cursor, and others — so the same skill files work regardless
of which harness loads them. Skills live in .claude/skills/ (or .github/skills/) as separate
files, never embedded.
The split adds its own failure modes. Skill-index rot: if a skill file is renamed or deleted but the definition still lists it, the agent tries to load a non-existent skill and falls back to guessing — the index must stay in sync with the filesystem. Wrong skill loaded: agents pick skills by their own judgment, so ambiguous tasks or poorly-named skills route to the wrong procedure. Self-contained violations: a skill that implicitly depends on another being loaded first produces inconsistent output. The pattern pays when tasks are clearly scoped and skills are genuinely orthogonal; it degrades when the task space is broad and overlapping.
SKILL.md works across tools and multiplies savings across fan-out.Retrieval practice — recall, don't peek
Question 1In progressive disclosure, the always-loaded definition should contain…
Question 2Splitting a 2000-token definition into a small definition plus on-demand skills…
Question 3The token savings from skills compound most across…
Question 4A skill file renamed without updating the index produces…
Question 5 · spaced recall from Lesson 07For a task that outlives one session, durability comes from…