Reversibility & Idempotency

Agents produce bad output — that's a given. The design question is what recovery costs when they do, and whether a re-run cleans up or compounds the mess.

Why this, for you: Lesson 5 stopped an agent from touching what it shouldn't; Lesson 7 made a crashed run resumable. This lesson covers the move underneath both: choosing how you'll undo an action before you choose how to do it, and making re-runs safe. Together they turn "the agent broke something" from an incident into a one-command fix.

Two paired disciplines. Rollback-first design treats recovery cost as a first-class constraint: for every action, decide the undo before the do. Idempotency makes running the same task twice produce the same end state — not duplicate branches, conflicting state, or compounded errors.

1 The undo-cost spectrum

The question is never whether the agent errs — it's the recovery cost when it does. Rank every action on how hard it is to undo, and keep the workflow on the cheap end.

Undo cost	Example	Design move
Instant	Delete a branch	Work on a branch by default
Easy	Close a draft PR, revert a commit	Draft PRs over direct pushes
Hard	Manual rollback, data restore	Add a human gate before it
Impossible	Sent email, charged card	Gate is the only defense

If the one-command undo doesn't exist, redesign the step before shipping the workflow. IBM Research's STRATUS uses a "transactional-no-regression" rule — mitigation agents may take only reversible actions, and commands per transaction are capped to keep rollbacks tractable. Bound each agent turn to changes you can undo in one step.

The reversible primitives are familiar: git branches (delete = nothing on main is touched), draft PRs (close = changes discarded), comments over body edits (appending is reversible; overwriting isn't), and checkpoints. External side effects — emails, webhooks, payments, CDN propagation — cannot be made reversible; gate them before the action, because there is no after.

2 Idempotency: safe to retry

Agents fail mid-task and get re-run with no memory of the first attempt. If run one created a branch or posted a comment before failing, run two meets state it doesn't know about. Idempotent design makes the second run detect that state and converge on the same result.

# check-before-act: one read to avoid a conflicting write # non-idempotent git checkout -b feature/123 # idempotent git checkout feature/123 2>/dev/null || git checkout -b feature/123 # unique key (issue #, SHA) makes an artifact findable instead of re-created

The core techniques: check before act, upsert over create, unique identifiers (a comment tagged [#123] is updated, not duplicated), and state labels as checkpoints. Git is naturally idempotent — identical commits dedupe at the object level; an already-pushed push is a no-op. Comment and label posts are not — treat them differently.

3 How they reinforce each other

Both depend on per-artifact granularity, and that's the link. If run one crashed after creating the branch but before posting the comment, the second run still needs to finish the comment — so you guard each artifact, not the whole workflow. Checkpoints help both: they shrink the retry window so only the segment since the last checkpoint must be safe. The catch is the same on both sides — checkpoints capture file edits, but bash-driven side effects (an rm, a migration, an API call) aren't tracked, so most agent side effects still need explicit idempotency.

When reversibility machinery is overhead

Reversibility can hide root cause: trivial undo tempts "revert and retry" instead of fixing the bug, delaying diagnosis until it surfaces somewhere harder to reverse. Gate latency dominates in high-frequency inner loops, where forcing every action through approval costs more throughput than the recovery saves. Ephemeral environments are their own rollback — branch isolation in a throwaway container is redundant. And idempotency has its own traps: concurrency opens TOCTOU gaps (two runs both read "no branch" and both create it — prefer server-side keys with atomic claims), and silent skips hide drift from a different actor. Apply fully for shared codebases and production; relax when the blast radius is small.

↪ Your win: design the undo first, make retry safe

Pick the undo before the do — if recovery costs more than one command, reconsider the approach.
Keep actions on the cheap end — branches and draft PRs, not direct pushes to main.
Gate the irreversible — emails, payments, and webhooks have no rollback; the gate is the defense.
Check before act — one read avoids a duplicate branch, comment, or PR on re-run.
Guard each artifact, not the workflow — partial failure means the next run must finish the rest.

Retrieval practice — recall, don't peek

Question 1Rollback-first design means you choose…

Question 2An idempotent agent operation, run twice, produces…

Question 3Sending an external email or charging a card should be…

Question 4Because failures can be partial, idempotency guards should be placed…

Question 5 · spaced recall from Lesson 13An evaluator-optimizer loop hurts most when the generator's baseline is…

Ask me anything. Want to audit one of your agent workflows for a one-command undo on every step, or add check-before-act guards to a pipeline that's been posting duplicates? Next, the last content lesson: Steering Running Agents — redirecting a live run without throwing away its context.