Context Engineering · ~7 min
Context isn't free space to fill — it's a finite budget. Every token you preload is a token you can't spend on reasoning.
A 200K window sounds like room to spare. But load AGENTS.md, five skill definitions, three
reference files, and the system prompt, and the agent can start the task with 150K already gone — leaving ~50K for
every tool call, file read, and line of reasoning that follows.
The trap is thinking of an empty window as wasted. It isn't. Every token you load up front displaces a token you could have spent on tool results, intermediate reasoning, and implementation — and that headroom only shrinks as turns accumulate.
It's worse than a simple ledger, too. Because attention cost grows with the square of token-pair relationships, a fully packed window is computationally thinner — early signal competes with late signal, and late-session reasoning gets weaker exactly when you need it most.
Claude Code budgets all skill descriptions combined to 1% of the context window (with an 8,000-character fallback cap). Full skill content loads only on invocation. Add more skills and each description must get leaner — the budget is fixed, so the cost is real and visible.
Two ways a token enters the window. Preload: paid on every task, zero latency — system prompt, project conventions, skill identifiers. On-demand (JIT): paid only when used, costs one tool call — full skill content, file reads, web fetches.
The rule of thumb: preload only what every task needs; load everything else on-demand. Loading reference material "just in case" converts a conditional cost into fixed overhead paid on every single task — even the ones that never touch it.
Budgeting isn't only about how much you load — it's about where it comes from. The strongest grounding pulls from multiple distinct sources, not one maxed-out signal: file structure, language-server symbols, git history and ADRs, persistent memory, live runtime state. Each layer covers a blind spot the others leave.
Schema or a file tree alone can't ground an agent in meaning. Two tables with similar names can differ in what they include — and that difference lives in the pipeline code, not the schema. Types say what a function accepts; git says what changed; an ADR says why. No single layer carries all three.
But layers aren't free either, and more is not monotonically better. Retrieval gains shrink as you add sources and can flip to hurt past a threshold — precision drops beyond roughly 10,000 documents and collapses past ~50,000.
Retrieval practice — recall, don't peek
Question 1The real cost of preloading a token is that it…
Question 2You should preload content only when it is…
Question 3A fully packed context window tends to make reasoning…
Question 4You should add another context layer only when it…
Question 5 · spaced recall from Lesson 15Feeding dependency versions into context prevents…