Harness Engineering · ~7 min
A sub-agent is a fresh context window you spend a task into and get only the answer back. The isolation is the feature — and the bill.
Up to now the harness shaped one agent's loop. Orchestration is the harness deciding to run more than one loop. The primitive is the sub-agent: an ephemeral, isolated agent that does a focused task and returns only its final result.
Each sub-agent runs in its own fresh context window. It inherits none of the parent's history, reasoning, or sibling outputs — only the prompt you hand it. And the parent receives only the sub-agent's final text, never its intermediate tool calls.
You define one as a Markdown file with YAML frontmatter in .claude/agents/. Two fields drive the
harness payoff: tools restricts what it can do, and model routes it to a cheaper or
stronger model per role.
The most common shape is orchestrator-worker: a lead agent decomposes the task, fans out scoped sub-agents, and synthesizes their results. Pick the structure by what the work needs:
| You need… | Reach for… |
|---|---|
| A focused, fire-and-forget result (review, search, test run) | A sub-agent |
| Independent parallel slices of one job | Fan-out sub-agents from an orchestrator |
| Agents that must exchange partial results or coordinate | An agent team — sub-agents can't talk to each other |
Delegation is not free, and the markup is large.
Anthropic's research-system retrospective reports multi-agent systems use roughly 15× more tokens than a single-thread chat (a single agent already ~4×). When the work takes fewer tokens to do inline than to describe-and-delegate, a sub-agent is slower and more expensive. The value of the delegated task has to justify the markup.
Two more catches follow from the same isolation. Debugging is harder — the parent sees only the final result, so a sub-agent that quietly misreads its task leaves no trail. And coding parallelizes worse than research: code tasks have fewer independent slices, so fan-out helps less than it does for a search. Delegate for context hygiene and genuine parallelism — not reflexively.
tools and model — least privilege plus the cheapest model that can do the job.Retrieval practice — recall, don't peek
Question 1The parent agent receives from a sub-agent…
Question 2Multi-agent systems use roughly how many times the tokens of a single chat?
Question 3When two agents must exchange partial results, the right tool is…
Question 4A defining property of a sub-agent's context is that it…
Question 5 · spaced recall from Lesson 03In a PreToolUse hook, exit code 2…