Part 2 · Verifying As You Build

Verifying Agent Work · ~7 min

Check at Each Step

An agent writes 500 lines, then verifies. The wrong assumption was at line 10. Everything after it is built on a mistake — and unwinding that cascade is the expensive part.

Why this, for you: when verification happens matters as much as whether it happens. Checking after each meaningful unit catches errors near their source — a one-line fix instead of a cascade audit. This is the single highest-leverage habit for daily agent-assisted coding.

Incremental verification inserts checkpoints between stages. After each meaningful unit of work, you verify before proceeding. The cost of a checkpoint is low; the cost of debugging downstream consequences is high.

1 Error cost grows with distance

A type mismatch caught at the point of introduction is a one-line fix. The same mismatch found after 10 functions have been written against the wrong type requires auditing every call site.

This is the fail-fast principle: surface problems immediately, in the location where they occurred, with full context still available. The same compounding shows up in LLM pipelines — small misstatements at an intermediate stage undergo semantic shifts and amplify downstream.

2 Checkpoint patterns by medium

# checkpoint-save: bound the blast radius to one interval git commit -m "known-good" # save before a batch # … make changes … pytest && mypy src/ # verify # on fail: git restore → retry, instead of debugging a cascade

3 The verifier must beat the generator

A checkpoint only helps if it's more reliable than the thing it verifies. Compilers, type checkers, and tests qualify. An LLM-as-judge that hallucinates does not — it rejects correct work and blesses wrong work. And anchor "is it fixed?" to that external signal: a checkpoint that only inspects the agent's own narration ("I fixed the bug," "all tests pass") is easy to fool. Cross-reference every claim against git diff, build exit codes, and test output.

When the checkpoint cadence is wrong

Too small a unit suppresses exploration — a model forced to pass a type check before line two can't sketch across functions. Throwaway work (spikes, prototypes) is cheaper to rewrite than to incrementally verify; match cadence to risk-based task sizing. And granularity can misalign with the failure mode: if bugs only manifest across components, unit-level checkpoints pass while the real failure hides until the end-to-end test. Verify where the verifier is stronger, a wrong step is expensive, and the unit carries signal.

↪ Your win: catch errors near their source

Retrieval practice — recall, don't peek

Question 1The cost of an agent's error grows mainly with…

Question 2You should avoid writing multiple functions before testing because the second…

Question 3A checkpoint only adds value when the verifier is…

Question 4Checking after every single line can backfire by…

Question 5 · spaced recall from Lesson 3In a verification ledger, if the INSERT didn't happen, then…

Ask me anything. Want a checkpoint-save loop wired into a coding agent (commit → change → verify → restore-on-fail), or help picking the right unit size for a given task's risk? Next: The Pre-Completion Checklist — blocking "done" until a fixed sequence has run.
✎ Feedback