The working vocabulary for this course. Once a term lives here, every lesson uses this word for it. Grows as we go.
When and How
Complexity ladder · lowest level that works
The escalation rule for agent design: direct model call → single agent with tools → multi-agent orchestration. Take the next rung only when the current one stops being reliable — not because the task looks hard. Multi-agent is the top rung, justified by prompt complexity, tool overload, or security boundaries.
Avoid: "more agents is more power" — most multi-agent configurations underperform a strong single agent.
The cost premium of coordination: a single agent runs ~4× the tokens of a chat interaction, orchestrator-plus-workers ~15×. Token usage explains roughly 80% of performance variance on research tasks, so the task's value must justify the bill before you go multi-agent.
One orchestrator decomposes a task into independent subtasks, dispatches them to parallel workers with scoped tools, and synthesizes the results. Workers never coordinate with each other — returning to the orchestrator is the only channel. Effort-scaling rules (1 / 2–4 / 10+ workers) live in the orchestrator's prompt.
Avoid: fanning out a sequentially-dependent task — that needs chaining, not parallelism.
The reasoning step where one agent evaluates worker outputs, reconciles conflicts, and assembles a unified result from the strongest elements. Deliberate assembly with justification — not concatenation and not a summary. If an orchestrator concatenates, it adds latency without quality.
Avoid: "synthesis = gather the outputs" — gathering is aggregation; synthesis is judgment.
Run one task through N independent agents with engineered diversity (temperature, seed context, prompt emphasis), then a synthesis agent merges the strongest parts and a committee reviews the merge. N=3–5 is the efficient range; the synthesis agent is the highest-risk component — a weak one makes the result worse than the best single attempt.
Avoid: confusing it with voting — voting picks the popular answer, synthesis combines complementary strengths.
A child that inherits the parent's entire system prompt, tools, and message history. Its first request matches the parent's prompt cache (billing at ~10% read rates), but it also inherits the parent's bias and accumulated tool results — and drops the input isolation a subagent otherwise provides. Fork to extend a load-bearing decision with siblings to amortize.
Avoid: forking a review, audit, or any child with egress when the parent touched untrusted content.
A child that starts with only the brief the orchestrator constructs — a different prompt and tool set, so its prefix doesn't match the parent cache. The reset distribution is what lets it disagree, making it the default for reviews, audits, and adversarial work. The fork/fresh axis is chosen per spawn, not globally.
An explicit contract — what the upstream stage produces and the downstream stage reads — passing conclusions, not transcripts. Common fields: what was done, what was found, what needs attention, what is unresolved. A typed schema makes field extraction deterministic; ambiguity at one handoff surfaces as a wrong action several agents downstream.
Avoid: forwarding the full transcript — it floods the receiver with reasoning it didn't produce.
A teammate stays in read-only plan mode until the lead approves its plan; rejections round-trip with feedback before any write. The gate is enforced by the permission model, not prompt discipline — edit tools are structurally unavailable until approval. Approval criteria live in the lead's prompt; the lead's unique signal is cross-task coherence against the shared task list.
Avoid: gating a one-file edit — the doubled round-trip costs more than a post-write diff review.
A separate read-only verifier — not the producer — sits on the critical path of every "done" claim, admits or rejects against deterministic checks, packetizes the decision, and fails closed on ambiguity. Earns its cost only when four conditions hold: verifier independent of producer, ground truth exists, on the critical path, and blocked precision measured.
Avoid: promoting an advisory verifier to enforcing without precision evidence — one deployment blocked valid work 99.6% of the time it rejected.
The rule that ambiguous and silent cases resolve to reject: the producer must clear the evidence bar, and missing evidence is rejection, not approval. Without it, an admission gate collapses into a stamping bureau.
One of three primary failure categories in the Multi-Agent System Failure Taxonomy (MAST) — communication breakdowns, inconsistent goal understanding, protocol violations — accounting for 36.9% of failures across 1,600+ annotated production traces. The category that typed handoffs, plan gates, and admission control exist to reduce.
Small changes to a lead agent's prompt unpredictably alter subagent behavior, because subagents interpret a delegation through their own context rather than executing it deterministically. The fix is framework prompts over prescriptive ones — and end-to-end testing on every prompt change.
A multi-agent prompt that encodes principles, effort budgets, and quality criteria — what "done" looks like, not how to get there — so subagents adapt within boundaries instead of breaking on small wording changes. The cascade-resilient alternative to prescriptive step-by-step instructions.
Avoid: framework prompts for compliance-critical or single-correct-path tasks — those want rigid sequences.
Running N agent versions concurrently — new sessions route to the latest, existing sessions drain on the version that started them. Removes the two-version ceiling of canary; necessary because agents are stateful and an atomic cutover breaks in-flight sessions. Rollback is a router change, near-instant because old versions are still running.
Avoid: rainbow for single-turn stateless agents — plain blue-green is simpler and equally safe.
An agent version is the tuple of four independently versioned layers — code, model, prompt, tools — since a change to any one alters behavior. Rollback means reverting to a known-good tuple, not just the latest commit.