Part 2 · Coordination Contracts

Multi-Agent Systems · ~9 min

Verify-Gated Completion

The agent that did the work is the worst judge of whether it's done. Put a separate, read-only verifier on the critical path of every "done" claim — and make silence mean reject.

Why this, for you: LLMs prefer their own output when grading it, and self-refinement amplifies that bias rather than correcting it. The fix is admission control: the producer doesn't get to declare victory. But this pattern is costly and easy to misapply — it earns its keep only under four specific conditions, and one measured deployment blocked valid work 99.6% of the time it rejected something.

Verify-gated completion makes a read-only verifier — not the producer — the admission-control authority over every completion claim. It admits or rejects against deterministic checks, writes the decision into a structured record, and fails closed on anything ambiguous.

1 The three primitives

Separating the authority to declare done from the agent doing the work removes a measured self-judgement bias. An external verifier breaks the loop; packetized records make the decision auditable independent of either agent's narration. Either half alone is weaker — self-verification without records is unfalsifiable; records without an external verifier capture only the producer's chosen evidence.

2 Four conditions — all must hold

ConditionWhy it's required
Verifier independent of producerSame model class and training data admits the same hallucinations
Ground truth existsTests, type checks, schema validation, CI exit codes — not another LLM's opinion
Verifier on the critical pathA sidecar advisory verifier yields audit data, not admission control
Blocked precision measuredWithout it, an enforcing gate may block more valid work than invalid

The reported gate had 0.39% blocked precision

In the cited deployment, the verifier hit 98.58% rule agreement but only 0.39% blocked precision — almost every rejection was a false positive. Promote an advisory verifier to enforcing without precision evidence and it mostly blocks valid work. The evidence base is one reporting cluster, 17 production events — re-measure before transferring the numbers.

3 Where it backfires

The pattern re-allocates failure modes; it does not eliminate them. It adds a new one — producer-verifier disagreement over what "done" means — which the failure taxonomy already names under inter-agent misalignment. If the four conditions don't hold, prefer agent-internal checks or recording without admission control.

↪ Your win: done is decided by evidence, not the doer

Retrieval practice — recall, don't peek

Question 1The verifier in admission control is deliberately…

Question 2Fail-closed defaults mean that an ambiguous completion claim is…

Question 3A verifier sharing the producer's model class will tend to…

Question 4Before promoting an advisory verifier to enforcing, you must have measured its…

Question 5 · spaced recall from Lesson 5The largest category of multi-agent failure is…

Ask me anything. Want help wiring a deterministic verifier (build + test + schema) onto a PR-creation boundary, or deciding whether your "done" actually has ground truth to gate on? That closes Part 2. Next: Why Multi-Agent Systems Fail — the taxonomy under all of it.
✎ Feedback