Part 2 · Guardrails

Harness Engineering · ~7 min

Sub-Agents & Orchestration

A sub-agent is a fresh context window you spend a task into and get only the answer back. The isolation is the feature — and the bill.

Why this, for you: sub-agents are how you keep a long task's noise out of the main thread — spawn a search, a review, a test run, and the parent never sees the 40 tool calls underneath. But the same isolation that protects your context also hides bugs and multiplies tokens. Knowing when to delegate is the skill.

Up to now the harness shaped one agent's loop. Orchestration is the harness deciding to run more than one loop. The primitive is the sub-agent: an ephemeral, isolated agent that does a focused task and returns only its final result.

1 Isolation is structural, not a suggestion

Each sub-agent runs in its own fresh context window. It inherits none of the parent's history, reasoning, or sibling outputs — only the prompt you hand it. And the parent receives only the sub-agent's final text, never its intermediate tool calls.

The boundary is enforced by the runtime, not asked for in a prompt. That structure buys three things at once: context isolation (the sub-agent only sees what it needs), parallelization (they share nothing, so many run concurrently), and error isolation (one failing sub-agent doesn't cancel its siblings).

You define one as a Markdown file with YAML frontmatter in .claude/agents/. Two fields drive the harness payoff: tools restricts what it can do, and model routes it to a cheaper or stronger model per role.

# .claude/agents/reviewer.md name: reviewer description: Reviews one file for correctness and style tools: [Read, Grep, Glob] # can't write — scoped to read-only review model: sonnet # cheaper model for a bounded task

2 Match the topology to the task

The most common shape is orchestrator-worker: a lead agent decomposes the task, fans out scoped sub-agents, and synthesizes their results. Pick the structure by what the work needs:

You need…Reach for…
A focused, fire-and-forget result (review, search, test run)A sub-agent
Independent parallel slices of one jobFan-out sub-agents from an orchestrator
Agents that must exchange partial results or coordinateAn agent team — sub-agents can't talk to each other

3 The cost the isolation hides

Delegation is not free, and the markup is large.

Multi-agent costs ~15× the tokens

Anthropic's research-system retrospective reports multi-agent systems use roughly 15× more tokens than a single-thread chat (a single agent already ~4×). When the work takes fewer tokens to do inline than to describe-and-delegate, a sub-agent is slower and more expensive. The value of the delegated task has to justify the markup.

Two more catches follow from the same isolation. Debugging is harder — the parent sees only the final result, so a sub-agent that quietly misreads its task leaves no trail. And coding parallelizes worse than research: code tasks have fewer independent slices, so fan-out helps less than it does for a search. Delegate for context hygiene and genuine parallelism — not reflexively.

↪ Your win: delegate to protect context, not by reflex

  • Spawn a sub-agent to keep noise out of the main thread — searches, reviews, test runs that would otherwise flood context.
  • Scope it with tools and model — least privilege plus the cheapest model that can do the job.
  • Fan out only genuinely independent work — and remember coding parallelizes less than research.
  • Reach for an agent team when sub-agents need to talk — sub-agents are fire-and-forget only.
  • Don't delegate trivial work — at ~15× tokens, describe-and-delegate has to beat doing it inline.

Retrieval practice — recall, don't peek

Question 1The parent agent receives from a sub-agent…

Question 2Multi-agent systems use roughly how many times the tokens of a single chat?

Question 3When two agents must exchange partial results, the right tool is…

Question 4A defining property of a sub-agent's context is that it…

Question 5 · spaced recall from Lesson 03In a PreToolUse hook, exit code 2…

Ask me anything. Want an orchestrator-worker layout for a real fan-out task, or the rule of thumb for when delegation beats inline given the token markup? Next in Part 3: Permissions & Safety Boundaries — what the agent is allowed to do at runtime.
✎ Feedback