Part 2 · Taming the Tail

Context Engineering · ~6 min

Just-in-Time Retrieval

Don't preload what the agent can fetch when it actually needs it — but know the trade you're making.

Why this, for you: the runtime twin of Lesson 05. Discoverability kept your files lean; this keeps your session lean — start with almost nothing, pull context as each step demands it. Core to MCP/harness design (#2) and to fast, focused daily sessions (#1).

An agent researching five doc sites doesn't need all five loaded before the first message — it needs to know they exist and how to reach them. Speculative preloading either overflows the window or buries the material in the low-attention middle (Lesson 02).

Structure context in two layers: a small startup set (instructions, conventions, tool descriptions) loaded once, and an on-demand layer the agent pulls via tool calls only when the current step requires it. Nothing enters the prompt until the agent asks.
LayerWhat goes inWhen
StartupInstructions, conventions, tool/skill descriptionsSession start
On-demandDoc pages, file contents, search results, API responsesWhen a step needs them

Mechanisms: MCP servers (data sources as tools), web fetch, file search, and sub-agents (isolated window fetches, returns a summary). A task needing 1 of 5 doc sections pays for that 1; a task needing none pays zero.

⚠ The catch: JIT is only free if retrieval is correct

Latency is the obvious cost. The hidden one is retrieval quality: a noisy retriever spends budget on distractors and degrades the very reasoning it was meant to protect. In one study, accuracy fell from 75% → below 40% as the corpus grew 54 → 1,128 docs — dense similarity search returned semantically similar but contextually wrong chunks. On-demand ≠ automatically better.

The decision: preload, retrieve, or both

Repetitive access to the same doc→ Preload it
Exploratory; relevant subset unknown→ Retrieve on-demand
Long-horizon task→ Both: preload instructions, retrieve reference, compact when full

↪ Your win: start lean, pull on demand, mind the retriever

Retrieval practice — recall, don't peek

Question 1JIT retrieval keeps which at startup?

Question 2Preload (don't retrieve) when…

Question 3Beyond latency, on-demand retrieval's failure mode is…

Question 4As the retrieval corpus grew large, accuracy…

Question 5 · spaced recall from Lesson 10Offloading a payload to disk is…

Ask me anything. Want the MCP two-layer config (tool descriptions at startup, zero document content)? Or how JIT interacts with prompt caching — pulled content lands in the dynamic tail, so it never bloats the cached prefix? Next in Part 2: Goal Recitation & Error Preservation — keeping a long session on-objective.
✎ Feedback