Part 3 · Loading & Economics

Context Engineering · ~7 min

Every Token Has a Cost

Context isn't free space to fill — it's a finite budget. Every token you preload is a token you can't spend on reasoning.

Why this, for you: this is the synthesis of the whole token-economics thread. Once you see the window as a budget, every earlier lesson — the dumb zone, signal density, JIT retrieval — becomes one decision: where does this token buy the most reasoning? A harness pattern (#2), and a mental model (#3) that reframes everything before it.

A 200K window sounds like room to spare. But load AGENTS.md, five skill definitions, three reference files, and the system prompt, and the agent can start the task with 150K already gone — leaving ~50K for every tool call, file read, and line of reasoning that follows.

1 Preloading is opportunity cost

The trap is thinking of an empty window as wasted. It isn't. Every token you load up front displaces a token you could have spent on tool results, intermediate reasoning, and implementation — and that headroom only shrinks as turns accumulate.

Context is a budget. Every preloaded token displaces a token available for work. The question is never "will it fit?" — it's "is this the best thing this token could be doing?"

It's worse than a simple ledger, too. Because attention cost grows with the square of token-pair relationships, a fully packed window is computationally thinner — early signal competes with late signal, and late-session reasoning gets weaker exactly when you need it most.

The 1% rule, made literal

Claude Code budgets all skill descriptions combined to 1% of the context window (with an 8,000-character fallback cap). Full skill content loads only on invocation. Add more skills and each description must get leaner — the budget is fixed, so the cost is real and visible.

2 Preload vs. on-demand

Two ways a token enters the window. Preload: paid on every task, zero latency — system prompt, project conventions, skill identifiers. On-demand (JIT): paid only when used, costs one tool call — full skill content, file reads, web fetches.

# always-on: just the identifier (~15 tokens) name: migrate-api description: "Migrate REST endpoints to the v2 API contract" # the steps + file paths load ONLY when migrate-api fires steps: - read: [src/api/v1/, src/api/v2/schema.json] - run: "npm test -- --testPathPattern=api"

The rule of thumb: preload only what every task needs; load everything else on-demand. Loading reference material "just in case" converts a conditional cost into fixed overhead paid on every single task — even the ones that never touch it.

3 Spread the budget across layers

Budgeting isn't only about how much you load — it's about where it comes from. The strongest grounding pulls from multiple distinct sources, not one maxed-out signal: file structure, language-server symbols, git history and ADRs, persistent memory, live runtime state. Each layer covers a blind spot the others leave.

Why one source isn't enough

Schema or a file tree alone can't ground an agent in meaning. Two tables with similar names can differ in what they include — and that difference lives in the pipeline code, not the schema. Types say what a function accepts; git says what changed; an ADR says why. No single layer carries all three.

But layers aren't free either, and more is not monotonically better. Retrieval gains shrink as you add sources and can flip to hurt past a threshold — precision drops beyond roughly 10,000 documents and collapses past ~50,000.

Add a layer only when it closes a real production error, not a theoretical gap. Every layer you add is budget spent — and noise risked — whether or not it ever earns its place.

↪ Your win: budget by task, not by what fits

Retrieval practice — recall, don't peek

Question 1The real cost of preloading a token is that it…

Question 2You should preload content only when it is…

Question 3A fully packed context window tends to make reasoning…

Question 4You should add another context layer only when it…

Question 5 · spaced recall from Lesson 15Feeding dependency versions into context prevents…

Ask me anything. Want a worked budget for a research-heavy task — what to preload, what to push to a sub-agent's isolated window — or how the 50% headroom rule maps onto a 1M context? Next: Assembling the Prompt — composing context per phase and mode, not one static block.
✎ Feedback