Context Engineering · ~7 min · the economics edge

The Immutable Prefix

The same assembly order that shapes attention also sets your bill — and one careless byte in the prefix silently charges you full price on every turn.

Why this, for you: caching is where harness design (priority #2) meets the cost line. It's not a toggle you flip afterward — it's a structural constraint on how you compose context. And it reframes a decision you already know (assembly order) around a second objective: money and latency.

Prompt caching reuses the model's computation for an exact, byte-identical prefix. On Anthropic, a cached read costs ~10% of base price; a cache write costs 125–200%. Manus calls cache hit rate "the single most important metric for a production agent" — a 10× differential.

Design context as an immutable prefix + a growing tail: static content (system prompt, tool definitions, project instructions) first and unchanging; variable content (history, latest message) last. The layout — not a config flag — decides whether you pay 10% or 100% every turn.

Cached prefix — stable across turns

System prompt
Tool definitions
Project instructions (CLAUDE.md/AGENTS.md)

Dynamic tail — grows each turn

Conversation history
Latest user message
Tool results

← reused at ~10%

recomputed

Three bytes that bust the cache (silently)

Modifying tool definitions mid-session. Names, descriptions, params — any change invalidates everything after. (Also: non-deterministic tool ordering — sort them.)
Switching models. Model-specific instructions live in the prefix; a swap busts the whole session. Treat it as a context boundary.
Mutating the prefix to carry state. A timestamp, cwd, or config value in an early section re-writes the cache every call. Volatile data belongs in the tail.

~10%

cached read vs base

125–200%

cache write premium

10×

hit-vs-miss differential

0 errors

a miss is charged silently

A real Claude Code SDK bug busted the cache on every call — 12× cost, undetected until someone watched cache_read_input_tokens vs cache_creation_input_tokens. Misses never throw; they just bill.

The tension with everything you've learned

Attention says "rules at both edges." Caching says "static first, variable last." Do they fight?

At the front, they agree — your stable rules in primacy are good for attention and sit in the cached prefix.

At the back, they don't actually collide — because they work at different scopes. The whole instruction block is static, so its internal tail (your "critical rules, read last" restatement) is still inside the cached prefix. The truly variable content — history, the new message — comes after the entire instruction block.

The resolution is one rule: keep the instruction block static and put its critical rules at its own edges; push anything volatile (timestamps, cwd, per-session data) out of the prefix entirely. Volatile-in-prefix is the one move that loses on both axes — it busts the cache and adds noise.

↪ Your win: build a stable prefix and watch the meter

Immutable prefix, dynamic tail. System prompt + tool defs + instructions never mutate mid-session.
No volatile content in the prefix — no timestamps, cwd, or per-call personalization. Push it to the tail.
Sort tool definitions deterministically — non-deterministic order is a silent cache miss every call.
Monitor cache_read vs cache_creation — a mid-session creation spike means something mutated the prefix.
Compact by forking: keep the prefix, append the summary as new tail content — don't rebuild from scratch.

Retrieval practice — recall, don't peek

Question 1A cached-prefix read costs about…

Question 2Which busts the prompt cache?

Question 3For cache efficiency, variable content goes…

Question 4A prompt-cache miss…

Question 5 · spaced recall from Lesson 06Adding more instruction layers past the ceiling tends to…

Ask me anything. Want the break-even math for your session shape (the 62.5-minute 1-hour-TTL rule), or to check whether any always-loaded file in content/ smuggles volatile content into the prefix? That last one is exactly what the skill's new CE-9 check now hunts for.