Found and Versioned

A great server nobody can find is dead weight. So is one that floods the window. Discoverability and token cost are two sides of the same budget — and they shape how your server evolves.

Why this, for you: your server's tool schemas are injected into the agent's context at startup, and they cost real tokens whether or not anyone calls them. Designing for discovery — small surface, search-friendly descriptions, a clear load policy — is what keeps your server installed instead of evicted for budget.

Tool schemas are injected into the model's context at startup. The GitHub MCP server alone has been measured at roughly 55,000 tokens across its 93 tool definitions; stacking servers can consume a third or more of a 200K window before any user input. Discovery is the mechanism that makes a large catalog affordable.

1 Eager vs. just-in-time loading

Modern hosts let each server choose whether its tools live in context from turn one (eager) or sit behind a tool-search step that loads them on first reference (just-in-time). Claude Code exposes this as alwaysLoad; the API has a per-tool defer_loading. Four signals decide:

Signal	Eager (`alwaysLoad: true`)	JIT (default)
Hit rate	Used most sessions	Used in <20% of sessions
Definition size	Small vs. budget	Large enough to dominate
Latency tolerance	Round-trip is felt	Search step is acceptable
Description quality	Trustworthy under search	Keywords don't match tasks

Anthropic's floor: enable tool search at 10+ tools or >10K tokens of definitions. Below that, eager-load everything. And keep your 3–5 most-used tools non-deferred for best performance.

Deferred loading preserves the prompt cache: deferred tool definitions append inline when discovered, so the cacheable prefix is untouched — unlike naively rebuilding the tool list per step, which busts the cache.

2 The silent failure: search-miss

JIT's failure mode is invisible. If your tool descriptions use jargon the model won't search for, the agent reports “no tool available” when one is right there.

Retrieval is the dominant MCP failure mode

One benchmark found retrieval errors account for nearly half of all failures in MCP agent tasks across diverse tool sets. Selection accuracy also degrades significantly past 30–50 visible tools. Audit description craft before you defer — and keep the catalog small.

3 Discovery scopes and version stability

Server definitions can live at user, project/workspace, and local scope. When the same name appears at multiple scopes, hosts apply most-specific-wins — the closest scope defining a name silently disables the outer ones. Two consequences for a server author:

Name for intent. If a developer's github points at a private fork and the team's at the public registry, dedup silently disables one. Shadowing is for overrides of the same intent; parallel intents need distinct names.
Version through descriptions and types, not tool names. Names should carry no version numbers (Lesson 3). When an upstream API gains a value, an enum breaks until redeploy — prefer thin string types on fast-churning fields so your surface stays stable across the agent's training gap.

The remote-vs-local choice from Lesson 1 also locks your auth path: remote forces OAuth, which forces the client-registration decision (CIMD vs. DCR), which shapes whether you need a credential vault. Sequence those decisions deliberately — each forecloses the next.

↪ Your win: discoverable, affordable, stable

Keep the catalog small — selection degrades past 30–50 visible tools.
Eager-load below 10 tools / 10K tokens; defer rarely-used servers above it.
Write search-friendly descriptions — search-miss is JIT's silent failure.
Name for intent, not version; most-specific-wins shadows duplicates silently.
Stabilize fast-churning fields with thin types so the surface survives upstream change.

Retrieval practice — recall, don't peek

Question 1Anthropic's floor for enabling tool search is roughly…

Question 2The silent failure mode of just-in-time loading is…

Question 3When one server name appears at several scopes, hosts apply…

Question 4Deferred tool definitions preserve the prompt cache because they…

Question 5 · spaced recall from Lesson 4For coding agents, the leg of the lethal trifecta to remove first is…

Ask me anything. Want help classifying a specific server as eager or JIT, or sketching a scope-and-naming plan so your tools don't shadow a teammate's? Next: the Capstone — a decision table that sequences every choice from this course into one server.