Part 2 · Coordination Contracts

Multi-Agent Systems · ~9 min

Handoffs and Coordination Contracts

Every agent works in its own context window. The handoff between them is the one channel — and the single largest place multi-agent systems lose information.

Why this, for you: the biggest category of multi-agent failure isn't a bad agent — it's inter-agent misalignment, ambiguity at one handoff that surfaces as a wrong action several agents downstream. Two contracts fix it: a typed handoff at the boundary, and a plan-approval gate before any write. Both move the cost of a mistake to the cheapest possible moment.

A research agent's findings don't automatically reach the writer. Each handoff is an information-loss point: too little context and the next agent guesses wrong; too much and it drowns in noise. The contract is what the upstream agent writes and the downstream agent reads.

1 Structure the handoff, don't forward the transcript

The receiving agent needs conclusions, not transcripts. Forwarding a raw exploration log inflates its context with reasoning it didn't produce and can't efficiently parse. Define what each stage produces — common fields across handoff formats:

A structured JSON or markdown schema beats prose notes: field extraction is deterministic and doesn't depend on the receiver parsing free text. In longer pipelines this compounds — each stage of ambiguity multiplies, so early structure prevents error propagation across multiple handoffs.
# research → writer handoff: conclusions and open items, not the trace { "stage": "research", "findings": [ "Tier 1 capped at 50 RPM", "429 carries Retry-After" ], "needs_attention": [ "Verify Tier 4 limits by direct measurement" ], "unresolved": [ "Whether prompt caching affects RPM is undocumented" ] } # writer reads findings for content, unresolved as open Qs — not facts

Persistent media — GitHub issues, PRs, comments — make handoffs durable, reviewable, and survivable across context resets. Labels encode pipeline state.

2 Gate the first write: the plan-approval handshake

In a lead-and-teammates topology, each teammate works in its own window and self-claims tasks. A plan-approval handshake gates the first write: the teammate produces a plan in read-only mode, sends it to the lead, and only begins editing after approval. Rejections round-trip with feedback — no code written during the rejection round.

What makes it a gate, not a suggestion: plan mode is enforced by the permission model, not prompt discipline. The edit tools are structurally unavailable until the lead's approval changes the mode. A prompt-level "please get approval first" is routinely violated under load.

Approval criteria live in the lead's prompt — "only approve plans that include test coverage," "reject plans that modify the database schema." The unique signal the lead adds over a generic critic is cross-task coherence: it sees every teammate's plan against the shared task list, the only vantage point in the team with that view. That's the cheapest intervention against two teammates colliding on the same file.

Both contracts have a backfire mode: ceremony

A schema on a single-stage task adds friction with no boundary to protect. A plan gate on a one-file edit doubles round-trips to catch what a post-write diff would. And a rigid schema can hide uncertainty — a well-formatted findings array reads as authoritative even when the upstream agent was tentative. Protocols pay off only when work genuinely crosses boundaries and a mistake is expensive to undo.

↪ Your win: explicit boundaries, cheap-moment review

Retrieval practice — recall, don't peek

Question 1The receiving agent at a handoff should be given…

Question 2A typed schema beats prose at a handoff because field extraction is…

Question 3The plan-approval handshake is a real gate because plan mode is enforced by…

Question 4The unique signal the lead adds over a generic critic is…

Question 5 · spaced recall from Lesson 4A forked subagent is the wrong choice for a child that must…

Ask me anything. Want a handoff schema for a specific pipeline of yours, or approval criteria to put in a lead's spawn prompt? Next in Part 2: Verify-Gated Completion — who gets to decide the work is actually done.
✎ Feedback