Assembling the Prompt

A monolithic system prompt pays for every section on every turn. Build the context deliberately instead — per mode, per phase, by delta.

Why this, for you: the prompt isn't a static document you write once — it's an artifact your harness assembles at runtime. This is the orchestration-design move (#2): stop hand-tuning one giant prompt, start composing the right context for the job in front of the model. Authoring payoff too (#3).

Earlier in Part 3 you saw every preloaded token displace a reasoning token (Lesson 16). The fix isn't a shorter prompt — it's a composed one: include only what this mode, this phase, and this step actually use. Four assembly disciplines, one principle.

1 Compose by mode, not monolith

A single static prompt works for a simple agent. As capabilities grow it accrues sections — identity, code-quality rules, safety constraints, provider quirks — and every conversation pays the token cost for every section, relevant or not.

Assemble the prompt at runtime from priority-ordered modular sections, including only what applies to the current mode, provider, and session state.

Each section carries a numeric priority that sets assembly order. Planning mode omits code-quality rules; execution mode omits planning heuristics — so irrelevant instructions never consume context or attention. Provider-specific blocks (a Claude-only directive) inject conditionally, without bloating the prompt for other providers.

# sections sorted by priority; filtered by mode + provider 10 identity # stable — cacheable prefix 45 output format # stable — cacheable prefix 60 code-quality # mode=execution only 75 plan-first # mode=planning only — mutually exclusive w/ 60 80 <claude> hint # provider=anthropic only 90 session state ← dynamic; MUST come last (Lesson 07)

Tie-in to caching: identity and tool schemas assemble first so the stable prefix never shifts; dynamic sections sit at the end. A conditionally included section placed early in the priority order invalidates the cache for every token after it — modular composition can defeat the caching it was meant to enable if dynamic content leaks into the prefix.

2 Assemble by phase

When an agent produces poor output, the instinct is to fix the prompt or swap the model. A better lever is the context bundle delivered to that agent for that phase. Different roles need different information.

Plan	architecture, constraints, high-level task — not file contents
Work	approved plan excerpt, exact file excerpts, validation command
Review	original spec, the diff, explicit verification criteria

Orchestrators need condensed summaries — enough to route and decompose; file contents waste attention on decisions they don't make. Workers need targeted, granular context — the exact files they'll edit and the command that confirms correctness. Give both the same bundle and you get drift: planners distracted by implementation detail, workers carrying planning artifacts that crowd out actionable context.

Under 3,000 tokens per agent

In the migration example, planner, implementer, and reviewer each operate with under 3,000 tokens of input — none receives the full project history. The reviewer gets the spec and the diff, not the implementer's intermediate drafts.

3 Chain with gates between steps

Don't ask one prompt to plan, implement, and validate at once. Decompose into a sequence of LLM calls, each with a single responsibility, each taking the previous output as input — and put a programmatic gate between steps.

draft → gate: all required sections present? → review → gate: review approved? → finalise # a failed gate retries, escalates, or aborts — # without polluting the next step with bad input

The gate is the defining feature: it intercepts errors at the point they occur rather than at final output. A call that plans and implements simultaneously defeats the chain — errors in either phase contaminate the other, and the gate between them can't fire.

Gates can't catch everything

Each call is a fresh stochastic failure point, so a long chain's clean-completion probability falls with every step — a cascading-failure risk. Gates only catch what a program can check; a plausible-but-wrong output that passes its gate still propagates. Keep chains as short as the task allows.

4 Grow the playbook by delta

When an agent learns a strategy and you fold it back in by rewriting the whole context, the LLM systematically drops the hard-won specifics — error-recovery sequences, tool ordering — because they read as verbose next to high-level guidance. That's brevity bias, and it compounds.

Monolithic rewrite: 18,282 → 122 tokens

Measured runs collapsed a working context from 18,282 tokens to 122 over repeated full rewrites — a 9.6-point accuracy drop. Each cycle feeds on the last and discards more nuance. That's context collapse.

Instead, accumulate structured delta entries — itemized units, each one strategy or failure mode, each with a unique ID and helpful/harmful counters. Updating one strategy doesn't regenerate the context; entries persist independently and merge through deterministic, non-LLM logic. Consistently useful entries surface; harmful ones are deprioritized — no labels required. It's the same logic as composition: change the part, not the whole.

↪ Your win: build the context, don't dump it

Compose by priority — modular sections toggled per mode/provider, not one monolith.
Static first, dynamic last — keep the cacheable prefix stable (Lesson 07).
Assemble per phase — summaries to planners, file excerpts + validation commands to workers.
Chain with gates — one job per call, a programmatic check between each, chains kept short.
Grow playbooks by delta — append structured entries; never let the model rewrite the whole.

Retrieval practice — recall, don't peek

Question 1The cost of a monolithic system prompt is that…

Question 2A planner agent should usually receive…

Question 3The defining feature of a prompt chain is…

Question 4Folding new strategies in by full rewrite tends to cause…

Question 5 · spaced recall from Lesson 16Every token preloaded into context…

Ask me anything. Want the priority-ordered compose_prompt skeleton for a harness, or how phase assembly and prompt chaining fit together (each chain step is a phase with its own context bundle)? Next: Measure Before You Optimize — why token-saving tricks routinely cost you accuracy.