Tool Engineering · ~7 min
Agents fail mid-task and get re-run from scratch — with no memory of what the first run already did. A tool that wasn't designed for that turns a retry into a duplicate.
Agents fail in ordinary ways: the context window fills, an API times out, a tool error halts the loop. When you re-run the agent it starts fresh, with no memory of what it already did. If the first run created a branch, posted a comment, or charged a card before failing, the second run walks into pre-existing state it can't see — and the design of your tool decides whether that heals or compounds.
The foundational technique is one read before every write: before creating, check if it already exists; before posting, check if an equivalent is already there. The cost is a single read operation; the alternative is duplicate state the agent can't reason about.
The same shape generalises: upsert over create (update the existing artifact instead of failing
on its existence), and use a unique identifier — an issue number, a commit SHA, a task ID — as the
key so the artifact is findable rather than blindly re-created. A comment carrying [#123] can be
located and updated; a branch named feature/issue-123 has a natural uniqueness constraint.
Not every action has a cheap existence check. External calls that create resources — a payment, an email, a webhook trigger — are inherently non-idempotent. For these, write an idempotency record: log the operation against a unique key before executing, and check the log before re-executing. The log is the dedup record the resource itself doesn't give you. Git is the happy exception — committing identical content yields the same tree SHA, and re-pushing an already-pushed branch is a no-op, so file and commit operations are idempotent by nature. Comment posts, label applies, and charges are not — treat them differently.
If the first run crashed after creating the branch but before posting the comment, a single workflow-level "already ran?" guard would wrongly skip the unfinished comment. Put the check-before-act on each action, so a partially-completed run resumes exactly where it stopped instead of short-circuiting on the first artifact it finds.
Client-side existence checks are not a universal fix. Three failure modes to design around:
| Failure mode | What goes wrong |
|---|---|
| Concurrency / TOCTOU | Two runs both read "doesn't exist" at once and both create it. Needs a server-side idempotency key with atomic claim semantics, not a client check. |
| Silent skip hides drift | Short-circuiting on pre-existing state also skips when that state came from a different actor or a stale run. Fail loudly on conflict rather than burying it. |
| Marker store TTL | A 24-hour dedup table stops protecting older replays. The idempotency record must outlive the worst-case retry horizon. |
When duplicates are genuinely costly, prefer atomic upserts, database-backed keys, or a server-enforced unique constraint over an in-tool check — the check tells the agent the truth at one instant, and concurrency moves the truth underneath it.
Retrieval practice — recall, don't peek
Question 1An operation is idempotent when running it twice…
Question 2The foundational technique for safe re-runs is to…
Question 3For a charge or email that can't be checked for existence, you should…
Question 4Two concurrent runs that both pass an existence check then both create the resource is a…
Question 5 · spaced recall from Lesson 06Two tools that are always called together should be…
readOnlyHint and idempotentHint, and what breaks when they lie.