Prompt Engineering · ~7 min
A rule you got right still fails if it's buried in a wall of others. The instruction file is its own context problem — and the fix is to make it a map, not the territory.
AGENTS.md quietly recreates the ceiling you learned to
avoid — and the cure isn't better rules, it's a different shape for the file itself.Every fix you've learned operates on one rule at a time. But rules accumulate in a file, and the file
has its own failure mode: it grows. "One big AGENTS.md" is a named early mistake — and the way out is to
treat the file as an index into knowledge that lives elsewhere.
The OpenAI Harness team named "one big AGENTS.md" as an early failure mode with four specific
consequences (OpenAI — Harness Engineering):
You'll recognise the first two: they are the compliance ceiling (Lesson 4) arriving through the back door. The file didn't get worse rules — it got more of them.
The fix is structural. AGENTS.md becomes a brief index — what this project is, where conventions live,
what to read first per task type. The knowledge itself moves into a versioned docs/ directory.
An entry reads: "For API conventions, see docs/conventions/api.md." The agent follows the
pointer when it needs that context, instead of carrying the whole encyclopedia in every session. The test for any line:
could it be replaced by a pointer to docs/? Then it belongs there, not here.
A pointer map controls size structurally. But the file refills temporally — the classic ratchet: agent errs → "add a rule to prevent this" → file bloats out of control. Each addition is one-way because no one can tell what's safe to delete.
The discipline that closes the loop is per-rule metadata — three fields on every terminal rule:
| Field | Question it answers |
|---|---|
| Source | Why was this rule added? (the incident, PR, failure) |
| Applicability | When does it fire? (file pattern, task type, branch) |
| Expiry | What observable retires it? (feature removed, never fires for N weeks) |
The metadata converts deletion from a judgement call into a closed-form predicate: has the expiry observable fired? The default flips from "keep when uncertain" to "delete when expired." Anthropic teaches a compatible rule without naming the triple — "If Claude already does something correctly without the instruction, delete it or convert it to a hook" (Claude Code best practices).
Pointers only work if the linked docs exist and stay current — and human maintenance is exactly what rots. So make
it deterministic: CI that breaks on a broken link from AGENTS.md to docs/, lint rules that
flag orphaned docs, an audit script that greps expiry: lines and opens a removal PR when the observable
fires. The Harness team runs linters plus a recurring "doc-gardening" agent for exactly this.
An ETH Zurich study of 138 Python tasks found repository context files raised agent steps 19–20%, and LLM-generated context files reduced task success ~3% versus none at all (Gloaguen et al. 2026). The pointer map assumes the agent follows pointers and that humans wrote the docs. For a small, conventional repo where the README and file layout already answer "what and where," an index just adds tokens.
AGENTS.md points; docs/ holds the knowledge.docs/, make it one.Retrieval practice — recall, don't peek
Question 1A single growing AGENTS.md mainly fails because it…
Question 2In the pointer-map pattern, the actual knowledge lives in…
Question 3The three lifecycle fields on a rule are source, applicability, and…
Question 4The most reliable way to keep pointers from going stale is to…
Question 5 · spaced recall from Lesson 6For format and style in a codebase, a hint to existing code beats an inline sample because it…