Don't preload what the agent can fetch when it actually needs it — but know the trade you're making.
Why this, for you: the runtime twin of Lesson 05. Discoverability kept your files lean;
this keeps your session lean — start with almost nothing, pull context as each step demands it. Core to
MCP/harness design (#2) and to fast, focused daily sessions (#1).
An agent researching five doc sites doesn't need all five loaded before the first message — it needs
to know they exist and how to reach them. Speculative preloading either overflows the window or buries the
material in the low-attention middle (Lesson 02).
Structure context in two layers: a small startup set (instructions, conventions, tool
descriptions) loaded once, and an on-demand layer the agent pulls via tool calls only when the
current step requires it. Nothing enters the prompt until the agent asks.
Doc pages, file contents, search results, API responses
When a step needs them
Mechanisms: MCP servers (data sources as tools), web fetch, file search, and sub-agents (isolated window fetches, returns a summary). A task needing 1 of 5 doc sections pays for that 1; a task needing none pays zero.
⚠ The catch: JIT is only free if retrieval is correct
Latency is the obvious cost. The hidden one is retrieval quality: a noisy retriever spends
budget on distractors and degrades the very reasoning it was meant to protect. In one study, accuracy fell from
75% → below 40% as the corpus grew 54 → 1,128 docs — dense similarity search returned
semantically similar but contextually wrong chunks. On-demand ≠ automatically better.
Question 4As the retrieval corpus grew large, accuracy…
Question 5 · spaced recall from Lesson 10Offloading a payload to disk is…
Ask me anything. Want the MCP two-layer config (tool descriptions at startup, zero document
content)? Or how JIT interacts with prompt caching — pulled content lands in the dynamic tail, so it never
bloats the cached prefix? Next in Part 2: Goal Recitation & Error Preservation — keeping a long
session on-objective.