Context Engineering · ~5 min

Lost in the Middle

A perfectly-written rule in the wrong position is a rule the model is statistically likely to ignore.

Why this, for you: straight at priorities #2 and #3 — harness design and authoring. This is the rule that governs how you lay out CLAUDE.md / AGENTS.md and system prompts. Get it right and your instructions get followed; get it wrong and they're technically present but practically invisible.

In Lesson 01 you learned how much context costs you. This lesson is about where in the window attention actually lands.

Transformer attention is U-shaped: strongest at the start and end of the context, weakest in the middle — regardless of how important the content is. Position affects whether an instruction is followed even when the wording is identical.
start middle end rules rules reference attention weight ↑
The U-shape traces to causal masking interacting with relative positional encodings (RoPE) — a structural property, not a per-model quirk.

Two consequences worth internalising. First, adding content to the middle degrades its neighbours: every instruction you insert pushes existing ones further from the high-attention edges. A sprawling AGENTS.md buries most of its own rules in the dead zone. Second, the middle is for reference, not rules — schemas, examples, lookup tables tolerate mid-context placement because the agent actively pulls them, rather than relying on passive attention.

The layout that follows the curve

# AGENTS.md ## Critical Rules (read first) - Never commit directly to `main`; always open a PR - Secrets via environment variables — never hardcode ## Reference: Project Structure # middle = reference src/api src/services src/models tests/ ## Reference: Conventions camelCase vars · PascalCase classes · async/await ## Closing Reminders (read last) - Run `npm test` before marking any task complete - Never commit directly to `main` # restated at the tail on purpose

↪ Your win: place by attention, and restate at the tail

Retrieval practice — recall, don't peek

Question 1The U-shaped attention curve is best explained by…

Question 2Must-follow rules belong…

Question 3The middle of a long instruction file is best used for…

Question 4 · spaced recall from Lesson 01Reasoning tasks effectively use roughly what share of the window?

Ask me anything. Want to refactor one of your real CLAUDE.md files to the attention curve and see the diff? Curious how this interacts with prompt caching (static-first) or with Critical Instruction Repetition? Say the word and we'll go there, or roll to Lesson 03.
✎ Feedback