Part 1 · The Surface

MCP Server Design · ~6 min

Data, Just in Time

You have a corpus the agent might need. The temptation is to load it. The right move is to make it searchable and let the agent pull only what the current step requires.

Why this, for you: a server that dumps data into context starves the agent of the budget it needs to reason. The pattern that wins — expose your data behind search/read tools, retrieve on demand — is the same one that makes your server cheap enough to leave installed. This is the design decision that keeps your server from being the one teams uninstall.

Every token loaded at session start is a token that can't be used for reasoning, intermediate output, or tool results. An agent that might consult five docs doesn't need all five in context — it needs to know they exist and how to fetch them.

1 Two layers: startup vs. on-demand

Structure what your server contributes to the agent's context in two layers:

LayerWhat goes inWhen loaded
StartupTool names, descriptions, server instructionsSession start
On-demandDocument contents, search results, recordsWhen a task step needs them

The agent starts lean. Your tool descriptions tell it what's available. When a step needs specific knowledge, it issues a tool call to retrieve it — and nothing enters the prompt until the agent asks.

Anthropic frames context as "a precious, finite resource" and the goal as assembling "the smallest set of high-signal tokens that maximize the likelihood of your desired outcome." A server that preloads its corpus is working against that.

2 What this looks like in a manifest

The agent receives only names and descriptions at startup. Content arrives on the call:

# startup: agent sees tool descriptions only — no document content search_docs "Search the docs corpus. Returns top matches with snippets." read_doc "Fetch one doc by id from search_docs. Returns full markdown." # on-demand: only when the task needs authentication guidance search_docs("webhook signature verification") → 3 snippets read_doc("api/authentication") → 4 KB now in context

A task needing one of five doc sections consumes context for that section alone. A task needing none consumes zero documentation tokens — no matter how large your corpus grows.

3 The failure mode: a noisy retriever

Just-in-time only pays off when what you return is correct. On-demand retrieval has a second failure mode beyond latency: when the retriever surfaces semantically-similar-but-wrong chunks, accuracy drops rather than improves.

Accuracy fell from 75% to under 40%

In one study, as a corpus grew from 54 to 1,128 documents, dense similarity search returned plausible-looking but contextually wrong results — and accuracy collapsed. A noisy retriever spends budget on distractors and degrades the very reasoning it was meant to protect.

So the server author's job isn't just "expose the data" — it's expose it so the right chunk comes back. That means tight tool descriptions, scoped results, and returning only what the agent needs for its next decision (you'll size and shape those returns in Lessons 3 and 5).

↪ Your win: make it findable, not resident

Retrieval practice — recall, don't peek

Question 1In the two-layer model, what loads at session start is…

Question 2The main cost of preloading a corpus into context is…

Question 3A noisy retriever surfacing similar-but-wrong chunks tends to…

Question 4Just-in-time retrieval fits best for tasks that are…

Question 5 · spaced recall from Lesson 1Read-only context the agent shouldn't have to call for is exposed as a…

Ask me anything. Want a search_docs + read_doc tool pair sketched out, or when to preload versus retrieve for a specific corpus? Next in Part 2: The Right Call, Obvious — naming, schemas, and errors that make the agent pick correctly.
✎ Feedback