Part 4 · Reliability Under Failure

Tool Engineering · ~7 min

Idempotent, Safe-to-Retry Tools

Agents fail mid-task and get re-run from scratch — with no memory of what the first run already did. A tool that wasn't designed for that turns a retry into a duplicate.

Why this, for you: the surface that decides what happens on the second attempt. Every other lesson so far made the first call go right. This one makes the inevitable re-run safe — so a timeout or a context overflow costs you a retry, not a duplicate branch, a doubled charge, or two conflicting records.

Agents fail in ordinary ways: the context window fills, an API times out, a tool error halts the loop. When you re-run the agent it starts fresh, with no memory of what it already did. If the first run created a branch, posted a comment, or charged a card before failing, the second run walks into pre-existing state it can't see — and the design of your tool decides whether that heals or compounds.

An operation is idempotent when running it twice produces the same end state as running it once. Without it, a re-run duplicates branches, doubles comments, and re-triggers charges. With it, the second run detects existing state and lands exactly where a single clean run would have.

1 Check before you act

The foundational technique is one read before every write: before creating, check if it already exists; before posting, check if an equivalent is already there. The cost is a single read operation; the alternative is duplicate state the agent can't reason about.

# Non-idempotent — second run fails or duplicates git checkout -b feature/123 # Idempotent — find-or-create on a unique key git checkout feature/123 2>/dev/null || git checkout -b feature/123

The same shape generalises: upsert over create (update the existing artifact instead of failing on its existence), and use a unique identifier — an issue number, a commit SHA, a task ID — as the key so the artifact is findable rather than blindly re-created. A comment carrying [#123] can be located and updated; a branch named feature/issue-123 has a natural uniqueness constraint.

2 Some effects can't be made idempotent — log them

Not every action has a cheap existence check. External calls that create resources — a payment, an email, a webhook trigger — are inherently non-idempotent. For these, write an idempotency record: log the operation against a unique key before executing, and check the log before re-executing. The log is the dedup record the resource itself doesn't give you. Git is the happy exception — committing identical content yields the same tree SHA, and re-pushing an already-pushed branch is a no-op, so file and commit operations are idempotent by nature. Comment posts, label applies, and charges are not — treat them differently.

Guard each artifact, not the whole workflow

If the first run crashed after creating the branch but before posting the comment, a single workflow-level "already ran?" guard would wrongly skip the unfinished comment. Put the check-before-act on each action, so a partially-completed run resumes exactly where it stopped instead of short-circuiting on the first artifact it finds.

3 Where check-before-act breaks

Client-side existence checks are not a universal fix. Three failure modes to design around:

Failure modeWhat goes wrong
Concurrency / TOCTOUTwo runs both read "doesn't exist" at once and both create it. Needs a server-side idempotency key with atomic claim semantics, not a client check.
Silent skip hides driftShort-circuiting on pre-existing state also skips when that state came from a different actor or a stale run. Fail loudly on conflict rather than burying it.
Marker store TTLA 24-hour dedup table stops protecting older replays. The idempotency record must outlive the worst-case retry horizon.

When duplicates are genuinely costly, prefer atomic upserts, database-backed keys, or a server-enforced unique constraint over an in-tool check — the check tells the agent the truth at one instant, and concurrency moves the truth underneath it.

↪ Your win: design for the second run, not just the first

Retrieval practice — recall, don't peek

Question 1An operation is idempotent when running it twice…

Question 2The foundational technique for safe re-runs is to…

Question 3For a charge or email that can't be checked for existence, you should…

Question 4Two concurrent runs that both pass an existence check then both create the resource is a…

Question 5 · spaced recall from Lesson 06Two tools that are always called together should be…

Ask me anything. Want to make a specific write tool safe to retry, or see why "guard each artifact" beats a single workflow-level guard on a half-finished run? Next: Annotations & Safe Concurrency — how the harness reads readOnlyHint and idempotentHint, and what breaks when they lie.
✎ Feedback