Context Engineering · ~7 min · the economics edge

The Immutable Prefix

The same assembly order that shapes attention also sets your bill — and one careless byte in the prefix silently charges you full price on every turn.

Why this, for you: caching is where harness design (priority #2) meets the cost line. It's not a toggle you flip afterward — it's a structural constraint on how you compose context. And it reframes a decision you already know (assembly order) around a second objective: money and latency.

Prompt caching reuses the model's computation for an exact, byte-identical prefix. On Anthropic, a cached read costs ~10% of base price; a cache write costs 125–200%. Manus calls cache hit rate "the single most important metric for a production agent" — a 10× differential.

Design context as an immutable prefix + a growing tail: static content (system prompt, tool definitions, project instructions) first and unchanging; variable content (history, latest message) last. The layout — not a config flag — decides whether you pay 10% or 100% every turn.

Cached prefix — stable across turns

  • System prompt
  • Tool definitions
  • Project instructions (CLAUDE.md/AGENTS.md)

Dynamic tail — grows each turn

  • Conversation history
  • Latest user message
  • Tool results
← reused at ~10%
recomputed

Three bytes that bust the cache (silently)

~10%
cached read vs base
125–200%
cache write premium
10×
hit-vs-miss differential
0 errors
a miss is charged silently

A real Claude Code SDK bug busted the cache on every call — 12× cost, undetected until someone watched cache_read_input_tokens vs cache_creation_input_tokens. Misses never throw; they just bill.

The tension with everything you've learned

Attention says "rules at both edges." Caching says "static first, variable last." Do they fight?

At the front, they agree — your stable rules in primacy are good for attention and sit in the cached prefix.

At the back, they don't actually collide — because they work at different scopes. The whole instruction block is static, so its internal tail (your "critical rules, read last" restatement) is still inside the cached prefix. The truly variable content — history, the new message — comes after the entire instruction block.

The resolution is one rule: keep the instruction block static and put its critical rules at its own edges; push anything volatile (timestamps, cwd, per-session data) out of the prefix entirely. Volatile-in-prefix is the one move that loses on both axes — it busts the cache and adds noise.

↪ Your win: build a stable prefix and watch the meter

Retrieval practice — recall, don't peek

Question 1A cached-prefix read costs about…

Question 2Which busts the prompt cache?

Question 3For cache efficiency, variable content goes…

Question 4A prompt-cache miss…

Question 5 · spaced recall from Lesson 06Adding more instruction layers past the ceiling tends to…

Ask me anything. Want the break-even math for your session shape (the 62.5-minute 1-hour-TTL rule), or to check whether any always-loaded file in content/ smuggles volatile content into the prefix? That last one is exactly what the skill's new CE-9 check now hunts for.
✎ Feedback