Harness Engineering · ~8 min
One lead agent decomposes a task, fans it out to parallel workers, and synthesizes the results. The parallelism is the win — and the 15× token bill is the catch.
The orchestrator-worker pattern has two roles. The orchestrator receives the task, decomposes it into independent subtasks, dispatches each to a worker, and synthesizes the results. Workers each take a bounded subtask with their own tool set and return findings. The orchestrator never executes subtasks; workers never coordinate with each other.
Both results are true. Parallel dispatch pays off when subtasks are genuinely independent — multiple sources, different methodologies on one dataset, code review across separate modules with no shared state. It does not help when subtasks are sequentially dependent: that needs chaining, not fan-out. Adding workers to a task that doesn't decompose cleanly buys coordination cost and returns nothing.
Two orchestrator responsibilities separate the wins from the waste. First, match worker count to complexity — and put the rule in the prompt, not in code:
| Query | Allocation |
|---|---|
| Simple | 1 agent, 3–10 tool calls |
| Moderate | 2–4 subagents, clearly divided responsibilities |
| Complex | 10+ subagents with partitioned search spaces |
Hard-coding agent counts removes the flexibility to match scale to complexity. Second, synthesis is a reasoning step, not aggregation: the orchestrator evaluates each worker's reliability, finds conflicts and gaps, and produces a unified output drawing on the strongest elements. If it simply concatenates worker outputs, the pattern adds latency without improving quality.
Isolation has a price, and it's steep. Multi-agent orchestration multiplies token consumption ~15× over chat (vs. ~4× for a single agent), and token usage explains roughly 80% of performance variance across research tasks. The effort-scaling rules in the orchestrator's prompt are the primary cost control. And the orchestrator prompt is the highest-leverage component overall — Anthropic reports small changes to it can unpredictably shift subagent behavior, so test decomposition explicitly across a range of inputs.
Over-spawning: too many workers for a simple query — effort-scaling rules prevent it. Orchestrator as single point of failure: a misclassified decomposition routes every worker to the wrong subtask, and the orchestrator's own LLM call caps throughput. Synthesis context overflow: the orchestrator must hold the task plus every worker's results — beyond ~4 substantive outputs this routinely blows the context budget. Premature termination and source-quality drift round it out. The pattern is conditional, not automatic: the task value must justify the 15× bill.
Retrieval practice — recall, don't peek
Question 1In orchestrator-worker, the workers…
Question 2Parallel dispatch pays off specifically when subtasks are…
Question 3Effort-scaling rules (worker count vs. complexity) should live in…
Question 4Multi-agent orchestration's token cost over chat is roughly…
Question 5 · spaced recall from Lesson 11The reasoning sandwich gives extra-high compute to…