Multi-Agent Systems · ~8 min
A second agent feels like more horsepower. Usually it's more coordination cost for no quality return. The first skill is knowing when not to.
Multi-agent systems are not an upgrade you apply to a hard task. They are a specific tool for a specific shape of problem — and the evidence says they lose more often than they win when that shape isn't present.
Microsoft's orchestration guidance states the rule plainly: "Use the lowest level of complexity that reliably meets your requirements." Anthropic's Building Effective Agents gives the same escalation — "add multi-step agentic systems only when simpler solutions fall short." There are three rungs:
| Rung | What it is | Solves |
|---|---|---|
| Direct model call | One prompt, no tools, no agent loop | Classification, summarization, single-step extraction |
| Single agent + tools | One agent that reasons, calls tools, loops until done | The right default for most tasks |
| Multi-agent | Several agents under an orchestrator or peer protocol | Only when prompt complexity, tool overload, or security boundaries break a single agent |
Each rung adds capability and failure surface. You only earn the next rung when the current one stops being reliable — not when the task merely looks intimidating.
The justification for multiple agents is narrow and specific: a task that needs multiple independent directions at once. A review of 94 multi-agent software-engineering papers confirms parallelism and specialization as the primary rationale for going multi-agent over single-agent. Concretely:
The common thread is independence. If subtask B needs subtask A's output, that's a chain, not a fan-out — parallelism buys you nothing and the agents just wait on each other.
Coordination isn't free, and the price tag is large. Anthropic's research-system data reports token multipliers of ~4× for a single agent and ~15× for multi-agent (orchestrator plus workers) over a plain chat interaction — with token usage explaining roughly 80% of performance variance across research tasks.
On genuinely complex research, Anthropic's internal evals showed Opus orchestrating Sonnet workers outperformed single-agent Opus by 90.2%. The architecture earns its 15× when the task is broad and parallelizable. It wastes 15× when the task is narrow or sequential.
So the decision is economic as much as architectural: the task's value has to justify a 15× compute bill, and that only holds when the work genuinely splits into independent directions.
Retrieval practice — recall, don't peek
Question 1The rule for picking a complexity level is to use the…
Question 2Multi-agent's primary rationale over a single agent is…
Question 3In the ten-benchmark evaluation, most multi-agent configurations…
Question 4A rough token multiplier for orchestrator-plus-workers over chat is…
Question 5A task where subtask B depends on subtask A's output should be…