Part 4 · Reliability Under Failure

Tool Engineering · ~7 min

Annotations & Safe Concurrency

A tool annotation looks like a passive safety badge. The moment a harness reads it to decide whether to run calls in parallel, it stops being advisory and starts governing execution.

Why this, for you: the surface where a one-word lie becomes a data race. Annotations let a harness overlap your read-only calls for a real wall-clock win — but only if they're honest. This lesson shows what the flags mean, why they're load-bearing now, and the audit that has to run before you trust them.

MCP tools carry four advisory annotations: readOnlyHint, destructiveHint, idempotentHint, and openWorldHint. For a long time these were cosmetic — they changed a confirmation prompt, nothing more. Then harnesses started wiring them into the dispatch path, and a misannotation stopped being a UX detail and became a correctness bug.

When a harness reads readOnlyHint: true and lifts the sequential gate on that basis, the annotation governs execution semantics, not just UX. Codex CLI 0.134.0 shipped exactly this: read-only tools automatically qualify for parallel dispatch. A tool that declares readOnlyHint: true but secretly mutates now produces racing writes the moment the agent issues two calls in one turn.

1 Why honest read-only enables a real speedup

Read-only tools, by the contract, don't mutate state shared across calls — so two concurrent invocations can't interfere through tool effects. The only thing they share is the transport and the server's process budget. That lets the harness collapse wall-clock cost for N read calls from sum(latency) toward max(latency) plus dispatch overhead.

# Sequential — pay every latency in series search_docs 850ms ──┐ kb_lookup 1050ms ├─ total 2400ms get_customer 500ms ──┘ # Parallel (all three honestly readOnlyHint:true) search_docs 850ms ─┐ kb_lookup 1050ms ─┼─ total 1050ms (max, not sum) get_customer 500ms ─┘

It's also cheap to wire: no planner, no dependency DAG, just a static lookup of the annotation in the tool list. The hint defaults to false, so a server that omits annotations stays sequential — conservative by design. The author opts into concurrency by setting one boolean, and accepts the responsibility that comes with it.

2 Annotations are untrusted until audited

The MCP spec is blunt: clients must treat annotations as untrusted unless they come from a trusted server. The flag is a claim by the tool author, not a guarantee the harness can verify. The most common and most dangerous misannotation is a tool marked readOnlyHint: true that actually writes — logs the access, bumps a last_seen timestamp, increments a counter. Sequentially that's invisible; under parallel dispatch, two calls race on the same write and the agent reasons over an inconsistent result.

Pair idempotentHint with readOnlyHint

A read that fails transiently must be safe to retry. Pure reads are idempotent by definition — but setting idempotentHint: true alongside readOnlyHint makes that explicit and gives the harness a safe recovery path on a dropped call. This is the same safe-to-retry property from the previous lesson, now declared on the surface instead of enforced in the body.

3 When parallel dispatch backfires

Even with honest annotations, concurrency isn't free. Audit for these before flipping the flag:

ConditionWhy parallel hurts
Rate-limited backendConcurrent reads against a per-second-capped API hit 429s that sequential calls would have spaced out — wall-clock gain traded for recovery turns.
Weakly consistent replicasList-then-get across read replicas can return divergent views to concurrent reads; the agent reasons over a self-inconsistent picture.
No per-server capTen reads fan out against a server sized for sequential traffic and blow its connection or process budget — the harness win becomes a server outage.
Model can't interleaveConcurrent dispatch returns results out of order; a model that degrades on interleaved-ledger reasoning underperforms the sequential baseline.

The safe operator posture for many third-party servers is to leave per-server concurrency disabled until an upstream idempotency audit has confirmed every "read-only" tool really is. The annotation is a promise; the audit is what makes it load-bearing.

↪ Your win: make annotations honest, then let the harness exploit them

  • Annotations now govern execution, not just prompts — a harness can parallelise on readOnlyHint.
  • Honest read-only collapses wall-clock from sum to max at near-zero wiring cost.
  • Treat annotations as untrusted — a mis-marked mutating tool turns parallel reads into a race.
  • Set idempotentHint with readOnlyHint so transient-failure retries are safe.
  • Audit rate limits, replica consistency, and per-server caps before enabling concurrency.

Retrieval practice — recall, don't peek

Question 1Once a harness dispatches in parallel on it, readOnlyHint governs…

Question 2Honest read-only tools let the harness cut wall-clock cost from…

Question 3The MCP spec says tool annotations should be treated as…

Question 4The most dangerous misannotation under parallel dispatch is a tool that…

Question 5 · spaced recall from Lesson 07The foundational technique for making a re-run safe is to…

Ask me anything. Want the audit checklist for declaring a tool readOnlyHint: true, or how hint-driven concurrency trades against an explicit dependency DAG? Next, Part 5 opens with Tool Discoverability at Scale — keeping selection sharp as the catalog grows past the point a model can hold it all in view.
✎ Feedback