Harness Engineering · ~8 min
Agents produce bad output — that's a given. The design question is what recovery costs when they do, and whether a re-run cleans up or compounds the mess.
Two paired disciplines. Rollback-first design treats recovery cost as a first-class constraint: for every action, decide the undo before the do. Idempotency makes running the same task twice produce the same end state — not duplicate branches, conflicting state, or compounded errors.
The question is never whether the agent errs — it's the recovery cost when it does. Rank every action on how hard it is to undo, and keep the workflow on the cheap end.
| Undo cost | Example | Design move |
|---|---|---|
| Instant | Delete a branch | Work on a branch by default |
| Easy | Close a draft PR, revert a commit | Draft PRs over direct pushes |
| Hard | Manual rollback, data restore | Add a human gate before it |
| Impossible | Sent email, charged card | Gate is the only defense |
The reversible primitives are familiar: git branches (delete = nothing on main is touched), draft PRs (close = changes discarded), comments over body edits (appending is reversible; overwriting isn't), and checkpoints. External side effects — emails, webhooks, payments, CDN propagation — cannot be made reversible; gate them before the action, because there is no after.
Agents fail mid-task and get re-run with no memory of the first attempt. If run one created a branch or posted a comment before failing, run two meets state it doesn't know about. Idempotent design makes the second run detect that state and converge on the same result.
The core techniques: check before act, upsert over create, unique
identifiers (a comment tagged [#123] is updated, not duplicated), and state labels as
checkpoints. Git is naturally idempotent — identical commits dedupe at the object level; an already-pushed
push is a no-op. Comment and label posts are not — treat them differently.
Both depend on per-artifact granularity, and that's the link. If run one crashed after creating
the branch but before posting the comment, the second run still needs to finish the comment — so you guard each
artifact, not the whole workflow. Checkpoints help both: they shrink the retry window so only the segment since the
last checkpoint must be safe. The catch is the same on both sides — checkpoints capture file edits, but bash-driven
side effects (an rm, a migration, an API call) aren't tracked, so most agent side effects still need
explicit idempotency.
Reversibility can hide root cause: trivial undo tempts "revert and retry" instead of fixing the bug, delaying diagnosis until it surfaces somewhere harder to reverse. Gate latency dominates in high-frequency inner loops, where forcing every action through approval costs more throughput than the recovery saves. Ephemeral environments are their own rollback — branch isolation in a throwaway container is redundant. And idempotency has its own traps: concurrency opens TOCTOU gaps (two runs both read "no branch" and both create it — prefer server-side keys with atomic claims), and silent skips hide drift from a different actor. Apply fully for shared codebases and production; relax when the blast radius is small.
Retrieval practice — recall, don't peek
Question 1Rollback-first design means you choose…
Question 2An idempotent agent operation, run twice, produces…
Question 3Sending an external email or charging a card should be…
Question 4Because failures can be partial, idempotency guards should be placed…
Question 5 · spaced recall from Lesson 13An evaluator-optimizer loop hurts most when the generator's baseline is…