An agent writes 500 lines, then verifies. The wrong assumption was at line 10. Everything after it is built on a mistake — and unwinding that cascade is the expensive part.
Why this, for you: when verification happens matters as much as whether it happens. Checking
after each meaningful unit catches errors near their source — a one-line fix instead of a cascade audit. This is
the single highest-leverage habit for daily agent-assisted coding.
Incremental verification inserts checkpoints between stages. After each meaningful unit of work, you
verify before proceeding. The cost of a checkpoint is low; the cost of debugging downstream consequences is high.
1 Error cost grows with distance
A type mismatch caught at the point of introduction is a one-line fix. The same mismatch found
after 10 functions have been written against the wrong type requires auditing every call site.
This is the fail-fast principle: surface
problems immediately, in the location where they occurred, with full context still available. The same compounding
shows up in LLM pipelines — small misstatements at an intermediate stage undergo semantic shifts and amplify
downstream.
2 Checkpoint patterns by medium
Code — build → test → iterate. Implement one function, run the suite, fix, move on. Do
not write multiple functions before running tests; the second may build on a broken assumption in the
first. Type checking is continuous verification — it compiles after each change, not after a batch.
Documents — claim by claim. Check each source as it's cited. A hallucinated citation in
paragraph 2 invalidates every argument that builds on it; verifying at the end means re-reading retroactively.
Agent pipelines — stage gates. Put verification between stages, not only at the end.
A research → draft → review pipeline with no gate after research means the draft is built on unverified claims.
# checkpoint-save: bound the blast radius to one interval
git commit -m "known-good" # save before a batch# … make changes …
pytest && mypy src/ # verify# on fail: git restore → retry, instead of debugging a cascade
3 The verifier must beat the generator
A checkpoint only helps if it's more reliable than the thing it verifies. Compilers, type
checkers, and tests qualify. An LLM-as-judge that hallucinates does not — it rejects correct work and blesses wrong
work. And anchor "is it fixed?" to that external signal: a checkpoint that only inspects the agent's own narration
("I fixed the bug," "all tests pass") is easy to fool. Cross-reference every claim against git diff,
build exit codes, and test output.
When the checkpoint cadence is wrong
Too small a unit suppresses exploration — a model forced to pass a type check before line two
can't sketch across functions. Throwaway work (spikes, prototypes) is cheaper to rewrite than to
incrementally verify; match cadence to risk-based task sizing. And granularity can misalign with
the failure mode: if bugs only manifest across components, unit-level checkpoints pass while the real
failure hides until the end-to-end test. Verify where the verifier is stronger, a wrong step is expensive, and
the unit carries signal.
↪ Your win: catch errors near their source
Error cost grows with distance — a line-10 mistake is a cascade if found at line 500.
One unit, then verify — don't batch functions ahead of the first passing test.
Gate between pipeline stages — verify research before drafting, claims before review.
Save a known-good state before a batch; restore-and-retry beats debugging a cascade.
The verifier must beat the generator — tests and compilers qualify; a flaky judge doesn't.
Retrieval practice — recall, don't peek
Question 1The cost of an agent's error grows mainly with…
Question 2You should avoid writing multiple functions before testing because the second…
Question 3A checkpoint only adds value when the verifier is…
Question 4Checking after every single line can backfire by…
Question 5 · spaced recall from Lesson 3In a verification ledger, if the INSERT didn't happen, then…
Ask me anything. Want a checkpoint-save loop wired into a coding agent (commit → change → verify →
restore-on-fail), or help picking the right unit size for a given task's risk? Next: The Pre-Completion
Checklist — blocking "done" until a fixed sequence has run.