When Many Agents Beat One

A second agent feels like more horsepower. Usually it's more coordination cost for no quality return. The first skill is knowing when not to.

Why this, for you: the reflex when an agent struggles is to add another agent. This lesson installs the opposite reflex — climb the complexity ladder one rung at a time, and reach for multiple agents only when the task structure forces it. Get this wrong and you pay ~15× the tokens for output a single agent would have matched.

Multi-agent systems are not an upgrade you apply to a hard task. They are a specific tool for a specific shape of problem — and the evidence says they lose more often than they win when that shape isn't present.

1 Climb the ladder, don't jump to the top

Microsoft's orchestration guidance states the rule plainly: "Use the lowest level of complexity that reliably meets your requirements." Anthropic's Building Effective Agents gives the same escalation — "add multi-step agentic systems only when simpler solutions fall short." There are three rungs:

Rung	What it is	Solves
Direct model call	One prompt, no tools, no agent loop	Classification, summarization, single-step extraction
Single agent + tools	One agent that reasons, calls tools, loops until done	The right default for most tasks
Multi-agent	Several agents under an orchestrator or peer protocol	Only when prompt complexity, tool overload, or security boundaries break a single agent

Each rung adds capability and failure surface. You only earn the next rung when the current one stops being reliable — not when the task merely looks intimidating.

2 What multi-agent is actually good at

The justification for multiple agents is narrow and specific: a task that needs multiple independent directions at once. A review of 94 multi-agent software-engineering papers confirms parallelism and specialization as the primary rationale for going multi-agent over single-agent. Concretely:

Research spanning independent sources or domains explored simultaneously
Analysis applying different methodologies to the same dataset
Review across separate modules with no shared state

The common thread is independence. If subtask B needs subtask A's output, that's a chain, not a fan-out — parallelism buys you nothing and the agents just wait on each other.

The benefit is conditional, not automatic. A protocol-aligned evaluation across ten benchmarks found most multi-agent configurations underperformed a single-agent baseline — only one of six tested workflows beat it. Adding workers to a task that doesn't decompose cleanly buys coordination cost with no quality return.

3 The bill comes due in tokens

Coordination isn't free, and the price tag is large. Anthropic's research-system data reports token multipliers of ~4× for a single agent and ~15× for multi-agent (orchestrator plus workers) over a plain chat interaction — with token usage explaining roughly 80% of performance variance across research tasks.

The payoff, when the shape is right

On genuinely complex research, Anthropic's internal evals showed Opus orchestrating Sonnet workers outperformed single-agent Opus by 90.2%. The architecture earns its 15× when the task is broad and parallelizable. It wastes 15× when the task is narrow or sequential.

So the decision is economic as much as architectural: the task's value has to justify a 15× compute bill, and that only holds when the work genuinely splits into independent directions.

↪ Your win: a default of one, justified before many

Climb the ladder — direct call, then single agent, then multi-agent; take the next rung only when the current one is unreliable.
Reach for multiple agents only for independent, parallel directions — research, multi-method analysis, module-isolated review.
Never fan out a sequential task — dependencies need chaining; parallelism just makes agents wait.
Budget ~15× tokens — and make the task's value justify it before you commit.
Treat "most multi-agent setups lose to a single agent" as the prior — multi-agent must earn the win, not assume it.

Retrieval practice — recall, don't peek

Question 1The rule for picking a complexity level is to use the…

Question 2Multi-agent's primary rationale over a single agent is…

Question 3In the ten-benchmark evaluation, most multi-agent configurations…

Question 4A rough token multiplier for orchestrator-plus-workers over chat is…

Question 5A task where subtask B depends on subtask A's output should be…

Ask me anything. Want help deciding whether a specific task of yours actually justifies multiple agents, or how to estimate the 15× token bill before you commit? Next in Part 1: The Orchestrator and Its Workers — the canonical multi-agent shape, in depth.