Tool Engineering · ~7 min
Lesson 6 thinned one toolset. But some agents legitimately need dozens of tools — and past a point the model can't hold them all in view. The fix isn't fewer tools; it's loading them at the right time.
Consolidation cuts overlap, but it can't shrink a genuinely large surface area — a coding agent with search, source control, CI, observability, and ticketing servers can easily exceed what fits comfortably in context. Showing every tool, every turn, has two costs that pull together: prefix tokens spent whether or not the tool is used, and a selection accuracy that degrades once too many tools are visible at once.
The modern answer is to not load everything. Hosts now support deferred tools: instead of sitting in the system-prompt prefix, a deferred tool waits behind a tool search step. The model issues a search, gets back a few matching tool references, and those expand inline only when needed.
Eager tools pay their token cost every turn but answer instantly. Deferred tools cost a search round-trip on first reference but stay out of the prefix the rest of the time. Crucially, deferral preserves prompt cache: the definition appends inline as a reference block rather than mutating the cached prefix.
Decide per server — or per tool — using four signals. The published floor: enabling tool search is worth it at roughly 10+ tools or >10K tokens of definitions; below that, eager-load everything.
| Signal | Eager-load | Defer (just-in-time) |
|---|---|---|
| Hit rate | Used in most sessions | Used in a small minority |
| Definition size | Small against the budget | Large enough to dominate |
| Latency tolerance | Cold round-trip is felt | A search step is acceptable |
| Description quality | Trustworthy under search | Keywords don't match task language |
The guidance operationalises the first two as a rule of thumb: keep your 3–5 most frequently used tools eager for best performance, and defer the long tail.
Deferral introduces a failure mode consolidation doesn't have. A deferred tool is only findable if the model can phrase a search query that matches its description. If the description uses internal jargon the model won't search for, the tool is invisible — the agent reports "no tool available" when the tool is right there.
Before you defer a tool, audit its description against the task language a model would actually use — not the domain's internal vocabulary. Deferral turns the description-quality lesson into a hard dependency: a vague description that merely cost you a wrong selection when eager now costs you a tool that can't be found at all.
At the far end of scale — catalogs in the hundreds or thousands of tools — in-context enumeration stops working entirely, and retrieval-based selection (embedding search over a tool index) replaces it. The principle is the same one Lesson 6 started: the agent should see the right tools, not all of them.
Retrieval practice — recall, don't peek
Question 1Tool selection accuracy degrades significantly once visible tools exceed about…
Question 2A deferred tool, versus an eager one, trades prefix tokens for…
Question 3The guidance says to keep eager the tools that are your…
Question 4"Tool not available" when the tool exists is the failure caused by…
Question 5 · spaced recall from Lesson 08A harness that parallelises read-only tools is only safe once the annotations have been…