A perfectly-written rule in the wrong position is a rule the model is statistically likely to ignore.
Why this, for you: straight at priorities #2 and #3 — harness design and authoring. This is the
rule that governs how you lay out CLAUDE.md / AGENTS.md and system prompts. Get it right and
your instructions get followed; get it wrong and they're technically present but practically invisible.
In Lesson 01 you learned how much context costs you. This lesson is about where
in the window attention actually lands.
Transformer attention is U-shaped: strongest at the start and end
of the context, weakest in the middle — regardless of how important the content is. Position
affects whether an instruction is followed even when the wording is identical.
The U-shape traces to causal masking interacting with relative positional encodings (RoPE) — a structural property, not a per-model quirk.
Two consequences worth internalising. First, adding content to the middle degrades its neighbours:
every instruction you insert pushes existing ones further from the high-attention edges. A sprawling
AGENTS.md buries most of its own rules in the dead zone. Second, the middle is for reference,
not rules — schemas, examples, lookup tables tolerate mid-context placement because the agent actively
pulls them, rather than relying on passive attention.
The layout that follows the curve
# AGENTS.md## Critical Rules (read first)
- Never commit directly to `main`; always open a PR
- Secrets via environment variables — never hardcode
## Reference: Project Structure # middle = reference src/api src/services src/models tests/## Reference: Conventions camelCase vars · PascalCase classes · async/await## Closing Reminders (read last)
- Run `npm test` before marking any task complete
- Never commit directly to `main` # restated at the tail on purpose
↪ Your win: place by attention, and restate at the tail
Most critical rules first; next-most-critical last. Reference material in the middle.
Keep instruction files short — a smaller file has a smaller dead middle.
In a long session where the agent forgot an early constraint, restate it at the end — don't assume it'll scroll back.
After a /compact, re-state the objective so it sits in the high-attention tail (this is where Lessons 01 and 02 meet).
Retrieval practice — recall, don't peek
Question 1The U-shaped attention curve is best explained by…
Question 2Must-follow rules belong…
Question 3The middle of a long instruction file is best used for…
Question 4 · spaced recall from Lesson 01Reasoning tasks effectively use roughly what share of the window?
Ask me anything. Want to refactor one of your real CLAUDE.md files to the
attention curve and see the diff? Curious how this interacts with prompt caching (static-first) or with
Critical Instruction Repetition? Say the word and we'll go there, or roll to Lesson 03.