Verify-Gated Completion

The agent that did the work is the worst judge of whether it's done. Put a separate, read-only verifier on the critical path of every "done" claim — and make silence mean reject.

Why this, for you: LLMs prefer their own output when grading it, and self-refinement amplifies that bias rather than correcting it. The fix is admission control: the producer doesn't get to declare victory. But this pattern is costly and easy to misapply — it earns its keep only under four specific conditions, and one measured deployment blocked valid work 99.6% of the time it rejected something.

Verify-gated completion makes a read-only verifier — not the producer — the admission-control authority over every completion claim. It admits or rejects against deterministic checks, writes the decision into a structured record, and fails closed on anything ambiguous.

1 The three primitives

Read-only verifier as completion authority. No write access to the work product — it inspects state, runs deterministic checks, emits admit/reject. Read-only is structural: it can't patch or retry, so correctness can't be offloaded onto it.
Packetized admission records. Each decision is a structured record — task id, evidence refs, verifier identity, decision, timestamp — not prose. Every completion gets a packet; ambiguous cases are inspectable after the fact.
Fail-closed defaults. Ambiguous cases resolve to reject. The producer must clear the evidence bar; silence is rejection. Without this, missing evidence collapses to "admit" and the gate becomes a stamping bureau.

Separating the authority to declare done from the agent doing the work removes a measured self-judgement bias. An external verifier breaks the loop; packetized records make the decision auditable independent of either agent's narration. Either half alone is weaker — self-verification without records is unfalsifiable; records without an external verifier capture only the producer's chosen evidence.

2 Four conditions — all must hold

Condition	Why it's required
Verifier independent of producer	Same model class and training data admits the same hallucinations
Ground truth exists	Tests, type checks, schema validation, CI exit codes — not another LLM's opinion
Verifier on the critical path	A sidecar advisory verifier yields audit data, not admission control
Blocked precision measured	Without it, an enforcing gate may block more valid work than invalid

The reported gate had 0.39% blocked precision

In the cited deployment, the verifier hit 98.58% rule agreement but only 0.39% blocked precision — almost every rejection was a false positive. Promote an advisory verifier to enforcing without precision evidence and it mostly blocks valid work. The evidence base is one reporting cluster, 17 production events — re-measure before transferring the numbers.

3 Where it backfires

Verifier shares the producer's failure modes — same class, same blind spots, same admitted hallucinations.
No independent ground truth — "done" is just another agent's opinion; verifier and producer argue the same uncertain claim.
Bypass paths — if agents route around the verifier via direct file writes, the gate is a suggestion.
Short, low-stakes work — the bookkeeping exceeds the audit value for single-turn or exploratory tasks.

The pattern re-allocates failure modes; it does not eliminate them. It adds a new one — producer-verifier disagreement over what "done" means — which the failure taxonomy already names under inter-agent misalignment. If the four conditions don't hold, prefer agent-internal checks or recording without admission control.

↪ Your win: done is decided by evidence, not the doer

Put a read-only verifier on the critical path of every completion claim — structurally unable to patch.
Packetize every decision — structured records, not prose, so ambiguous cases stay inspectable.
Fail closed — ambiguity and silence both mean reject; the producer clears the bar.
Demand all four conditions — independence, ground truth, on-path, measured blocked precision.
Don't promote advisory to enforcing without precision evidence — 0.39% blocked precision blocks valid work.

Retrieval practice — recall, don't peek

Question 1The verifier in admission control is deliberately…

Question 2Fail-closed defaults mean that an ambiguous completion claim is…

Question 3A verifier sharing the producer's model class will tend to…

Question 4Before promoting an advisory verifier to enforcing, you must have measured its…

Question 5 · spaced recall from Lesson 5The largest category of multi-agent failure is…

Ask me anything. Want help wiring a deterministic verifier (build + test + schema) onto a PR-creation boundary, or deciding whether your "done" actually has ground truth to gate on? That closes Part 2. Next: Why Multi-Agent Systems Fail — the taxonomy under all of it.