Part 2 · Building Tools That Drive Well

MCP Server Design · ~7 min

Found and Versioned

A great server nobody can find is dead weight. So is one that floods the window. Discoverability and token cost are two sides of the same budget — and they shape how your server evolves.

Why this, for you: your server's tool schemas are injected into the agent's context at startup, and they cost real tokens whether or not anyone calls them. Designing for discovery — small surface, search-friendly descriptions, a clear load policy — is what keeps your server installed instead of evicted for budget.

Tool schemas are injected into the model's context at startup. The GitHub MCP server alone has been measured at roughly 55,000 tokens across its 93 tool definitions; stacking servers can consume a third or more of a 200K window before any user input. Discovery is the mechanism that makes a large catalog affordable.

1 Eager vs. just-in-time loading

Modern hosts let each server choose whether its tools live in context from turn one (eager) or sit behind a tool-search step that loads them on first reference (just-in-time). Claude Code exposes this as alwaysLoad; the API has a per-tool defer_loading. Four signals decide:

SignalEager (alwaysLoad: true)JIT (default)
Hit rateUsed most sessionsUsed in <20% of sessions
Definition sizeSmall vs. budgetLarge enough to dominate
Latency toleranceRound-trip is feltSearch step is acceptable
Description qualityTrustworthy under searchKeywords don't match tasks
Anthropic's floor: enable tool search at 10+ tools or >10K tokens of definitions. Below that, eager-load everything. And keep your 3–5 most-used tools non-deferred for best performance.

Deferred loading preserves the prompt cache: deferred tool definitions append inline when discovered, so the cacheable prefix is untouched — unlike naively rebuilding the tool list per step, which busts the cache.

2 The silent failure: search-miss

JIT's failure mode is invisible. If your tool descriptions use jargon the model won't search for, the agent reports “no tool available” when one is right there.

Retrieval is the dominant MCP failure mode

One benchmark found retrieval errors account for nearly half of all failures in MCP agent tasks across diverse tool sets. Selection accuracy also degrades significantly past 30–50 visible tools. Audit description craft before you defer — and keep the catalog small.

3 Discovery scopes and version stability

Server definitions can live at user, project/workspace, and local scope. When the same name appears at multiple scopes, hosts apply most-specific-wins — the closest scope defining a name silently disables the outer ones. Two consequences for a server author:

The remote-vs-local choice from Lesson 1 also locks your auth path: remote forces OAuth, which forces the client-registration decision (CIMD vs. DCR), which shapes whether you need a credential vault. Sequence those decisions deliberately — each forecloses the next.

↪ Your win: discoverable, affordable, stable

Retrieval practice — recall, don't peek

Question 1Anthropic's floor for enabling tool search is roughly…

Question 2The silent failure mode of just-in-time loading is…

Question 3When one server name appears at several scopes, hosts apply…

Question 4Deferred tool definitions preserve the prompt cache because they…

Question 5 · spaced recall from Lesson 4For coding agents, the leg of the lethal trifecta to remove first is…

Ask me anything. Want help classifying a specific server as eager or JIT, or sketching a scope-and-naming plan so your tools don't shadow a teammate's? Next: the Capstone — a decision table that sequences every choice from this course into one server.
✎ Feedback