MCP Server Design · ~6 min
You have a corpus the agent might need. The temptation is to load it. The right move is to make it searchable and let the agent pull only what the current step requires.
Every token loaded at session start is a token that can't be used for reasoning, intermediate output, or tool results. An agent that might consult five docs doesn't need all five in context — it needs to know they exist and how to fetch them.
Structure what your server contributes to the agent's context in two layers:
| Layer | What goes in | When loaded |
|---|---|---|
| Startup | Tool names, descriptions, server instructions | Session start |
| On-demand | Document contents, search results, records | When a task step needs them |
The agent starts lean. Your tool descriptions tell it what's available. When a step needs specific knowledge, it issues a tool call to retrieve it — and nothing enters the prompt until the agent asks.
The agent receives only names and descriptions at startup. Content arrives on the call:
A task needing one of five doc sections consumes context for that section alone. A task needing none consumes zero documentation tokens — no matter how large your corpus grows.
Just-in-time only pays off when what you return is correct. On-demand retrieval has a second failure mode beyond latency: when the retriever surfaces semantically-similar-but-wrong chunks, accuracy drops rather than improves.
In one study, as a corpus grew from 54 to 1,128 documents, dense similarity search returned plausible-looking but contextually wrong results — and accuracy collapsed. A noisy retriever spends budget on distractors and degrades the very reasoning it was meant to protect.
So the server author's job isn't just "expose the data" — it's expose it so the right chunk comes back. That means tight tool descriptions, scoped results, and returning only what the agent needs for its next decision (you'll size and shape those returns in Lessons 3 and 5).
Retrieval practice — recall, don't peek
Question 1In the two-layer model, what loads at session start is…
Question 2The main cost of preloading a corpus into context is…
Question 3A noisy retriever surfacing similar-but-wrong chunks tends to…
Question 4Just-in-time retrieval fits best for tasks that are…
Question 5 · spaced recall from Lesson 1Read-only context the agent shouldn't have to call for is exposed as a…
search_docs + read_doc tool pair sketched out, or
when to preload versus retrieve for a specific corpus? Next in Part 2: The Right Call, Obvious — naming,
schemas, and errors that make the agent pick correctly.