Just-in-Time Retrieval

Don't preload what the agent can fetch when it actually needs it — but know the trade you're making.

Why this, for you: the runtime twin of Lesson 05. Discoverability kept your files lean; this keeps your session lean — start with almost nothing, pull context as each step demands it. Core to MCP/harness design (#2) and to fast, focused daily sessions (#1).

An agent researching five doc sites doesn't need all five loaded before the first message — it needs to know they exist and how to reach them. Speculative preloading either overflows the window or buries the material in the low-attention middle (Lesson 02).

Structure context in two layers: a small startup set (instructions, conventions, tool descriptions) loaded once, and an on-demand layer the agent pulls via tool calls only when the current step requires it. Nothing enters the prompt until the agent asks.

Layer	What goes in	When
Startup	Instructions, conventions, tool/skill descriptions	Session start
On-demand	Doc pages, file contents, search results, API responses	When a step needs them

Mechanisms: MCP servers (data sources as tools), web fetch, file search, and sub-agents (isolated window fetches, returns a summary). A task needing 1 of 5 doc sections pays for that 1; a task needing none pays zero.

⚠ The catch: JIT is only free if retrieval is correct

Latency is the obvious cost. The hidden one is retrieval quality: a noisy retriever spends budget on distractors and degrades the very reasoning it was meant to protect. In one study, accuracy fell from 75% → below 40% as the corpus grew 54 → 1,128 docs — dense similarity search returned semantically similar but contextually wrong chunks. On-demand ≠ automatically better.

The decision: preload, retrieve, or both

Repetitive access to the same doc→ Preload it

Exploratory; relevant subset unknown→ Retrieve on-demand

Long-horizon task→ Both: preload instructions, retrieve reference, compact when full

↪ Your win: start lean, pull on demand, mind the retriever

Preload instructions + tool descriptions only — not reference content.
Fetch via tool calls (MCP, web fetch, file search) at the step that needs it.
Preload only repetitive-access material; retrieve the unknown subset.
Watch retrieval quality — a noisy retriever is worse than preloading; narrow the corpus or scope the query.
Delegate retrieval-heavy steps to a sub-agent — it fetches in its own window, returns a summary (Lesson 10's offload, as an agent).

Retrieval practice — recall, don't peek

Question 1JIT retrieval keeps which at startup?

Question 2Preload (don't retrieve) when…

Question 3Beyond latency, on-demand retrieval's failure mode is…

Question 4As the retrieval corpus grew large, accuracy…

Question 5 · spaced recall from Lesson 10Offloading a payload to disk is…

Ask me anything. Want the MCP two-layer config (tool descriptions at startup, zero document content)? Or how JIT interacts with prompt caching — pulled content lands in the dynamic tail, so it never bloats the cached prefix? Next in Part 2: Goal Recitation & Error Preservation — keeping a long session on-objective.

Just-in-Time Retrieval

⚠ The catch: JIT is only free if retrieval is correct

The decision: preload, retrieve, or both

↪ Your win: start lean, pull on demand, mind the retriever

Go deeper