Context Engineering · ~7 min
A monolithic system prompt pays for every section on every turn. Build the context deliberately instead — per mode, per phase, by delta.
Earlier in Part 3 you saw every preloaded token displace a reasoning token (Lesson 16). The fix isn't a shorter prompt — it's a composed one: include only what this mode, this phase, and this step actually use. Four assembly disciplines, one principle.
A single static prompt works for a simple agent. As capabilities grow it accrues sections — identity, code-quality rules, safety constraints, provider quirks — and every conversation pays the token cost for every section, relevant or not.
Each section carries a numeric priority that sets assembly order. Planning mode omits code-quality rules; execution mode omits planning heuristics — so irrelevant instructions never consume context or attention. Provider-specific blocks (a Claude-only directive) inject conditionally, without bloating the prompt for other providers.
Tie-in to caching: identity and tool schemas assemble first so the stable prefix never shifts; dynamic sections sit at the end. A conditionally included section placed early in the priority order invalidates the cache for every token after it — modular composition can defeat the caching it was meant to enable if dynamic content leaks into the prefix.
When an agent produces poor output, the instinct is to fix the prompt or swap the model. A better lever is the context bundle delivered to that agent for that phase. Different roles need different information.
| Plan | architecture, constraints, high-level task — not file contents |
| Work | approved plan excerpt, exact file excerpts, validation command |
| Review | original spec, the diff, explicit verification criteria |
Orchestrators need condensed summaries — enough to route and decompose; file contents waste attention on decisions they don't make. Workers need targeted, granular context — the exact files they'll edit and the command that confirms correctness. Give both the same bundle and you get drift: planners distracted by implementation detail, workers carrying planning artifacts that crowd out actionable context.
In the migration example, planner, implementer, and reviewer each operate with under 3,000 tokens of input — none receives the full project history. The reviewer gets the spec and the diff, not the implementer's intermediate drafts.
Don't ask one prompt to plan, implement, and validate at once. Decompose into a sequence of LLM calls, each with a single responsibility, each taking the previous output as input — and put a programmatic gate between steps.
The gate is the defining feature: it intercepts errors at the point they occur rather than at final output. A call that plans and implements simultaneously defeats the chain — errors in either phase contaminate the other, and the gate between them can't fire.
Each call is a fresh stochastic failure point, so a long chain's clean-completion probability falls with every step — a cascading-failure risk. Gates only catch what a program can check; a plausible-but-wrong output that passes its gate still propagates. Keep chains as short as the task allows.
When an agent learns a strategy and you fold it back in by rewriting the whole context, the LLM systematically drops the hard-won specifics — error-recovery sequences, tool ordering — because they read as verbose next to high-level guidance. That's brevity bias, and it compounds.
Measured runs collapsed a working context from 18,282 tokens to 122 over repeated full rewrites — a 9.6-point accuracy drop. Each cycle feeds on the last and discards more nuance. That's context collapse.
Instead, accumulate structured delta entries — itemized units, each one strategy or failure mode, each with a unique ID and helpful/harmful counters. Updating one strategy doesn't regenerate the context; entries persist independently and merge through deterministic, non-LLM logic. Consistently useful entries surface; harmful ones are deprioritized — no labels required. It's the same logic as composition: change the part, not the whole.
Retrieval practice — recall, don't peek
Question 1The cost of a monolithic system prompt is that…
Question 2A planner agent should usually receive…
Question 3The defining feature of a prompt chain is…
Question 4Folding new strategies in by full rewrite tends to cause…
Question 5 · spaced recall from Lesson 16Every token preloaded into context…
compose_prompt skeleton for a harness, or how
phase assembly and prompt chaining fit together (each chain step is a phase with its own context bundle)? Next: Measure Before You Optimize — why token-saving tricks routinely cost you accuracy.