Tool Engineering · ~7 min
Lesson 3 budgeted the tokens of one call's output. This one budgets the whole toolset — the tokens it costs before a single call, and the wall-clock it costs across many — as a line item you track, not a side effect you discover.
A correct toolset can still be a costly one. The catalog itself has a price the agent pays on every turn before it acts, and a multi-step task pays a latency tax that compounds across calls. Neither shows up in a unit test — they show up in the bill and the wall clock — so you budget them deliberately.
Tool definitions are injected into context, so the catalog spends tokens whether or not a tool is called. Three moves, each from earlier in the course, cut that ledger:
| Move | What it saves |
|---|---|
| Defer the long tail (L9) | Keeps rarely-used tool definitions out of the prefix until a search needs them — the difference between ~1.17M and ~8.7K tokens at the extreme. |
| Trim schemas (L10) | Schemas dominate per-tool token cost; drop optional fields and dedupe shared types so each definition is cheaper. |
| Clear tool results (L3, L4) | Return only the next decision's inputs, and let the harness clear stale results — "one of the safest, lightest-touch forms of compaction." |
The framing is the budget itself: every definition and every result competes for the same window. A toolset that's affordable on turn one can starve the task by turn twenty if nothing is ever cleared.
A multi-step task pays latency per call, and serial calls add up. The single biggest lever is the one from Lesson 8: independent read-only calls don't have to run in series.
The wall-clock cost of N independent reads drops from sum(latency) toward max(latency).
Consolidating co-called tools (Lesson 6) cuts latency a second way — one round-trip instead of a chained five — and
deferring removes the discovery round-trip from the common path. Cost work and correctness work are the same moves,
read against a different ledger.
Budgeting is picking a point on each axis, not minimising blindly. The same trade-offs that ran through the course reappear as cost decisions:
An over-trimmed output that omits a field the agent needs forces an extra round-trip — sometimes more expensive than returning the field once. Parallel dispatch against a rate-limited backend buys wall-clock with 429-recovery turns and burned quota. A deferred hot tool adds a cold-start search to the very first turn that needs it. The cheapest call in isolation can be the most expensive across the task — budget the task, not the call.
The discipline's standing caveat applies hardest here: this engineering pays back on durable, shared, high-traffic toolsets. For a one-off script the catalog tax is paid once and the latency never compounds — there's no budget worth keeping.
sum toward max.Retrieval practice — recall, don't peek
Question 1The token cost of a large tool catalog is paid…
Question 2Overlapping independent reads drops the latency ledger from…
Question 3Within a single tool definition, the dominant token cost is usually the…
Question 4An over-trimmed output that omits a needed field is costly because it forces…
Question 5 · spaced recall from Lesson 10Read-only context the agent should see but not invoke is best exposed as an MCP…