Harness Engineering · ~7 min
Your AGENTS.md is a context-budget decision, not documentation. Two levers govern it: how much it says, and at what altitude it says it.
Legibility was Pillar 1. The instruction file (AGENTS.md, CLAUDE.md) is its
front door. But more instruction is not more legible — past a point it's the opposite.
The OpenAI Harness team named "one big AGENTS.md" as an early failure mode with four consequences:
context crowding (the file steals budget from the task), attention dilution (when
every rule is present, none is prominent), unverifiable scope (no one can tell what's still current),
and instant rot (piecemeal edits accumulate contradictions).
AGENTS.md to roughly 100 lines as a pointer map — what the project
is, where conventions live, what to read first per task. The knowledge itself lives in a versioned docs/
directory the agent pulls on demand.And the bloat has a name. The lecture that popularized this calls out the ratchet directly: "agent makes a mistake, you say 'add a rule to prevent this', add it to AGENTS.md, it works temporarily, agent makes a different mistake, add another rule, repeat, file bloats out of control." Each addition is one-way because no one can tell what's safe to delete.
An ETH Zürich study of 138 Python tasks found repository context files — even human-written ones — raised the number of agent steps by 19–20%, and LLM-generated context files cut task success by ~3% versus no file at all. The instruction file is a resource-allocation decision. Spend the budget only on what the agent can't discover itself.
The second lever is the level a rule is written at. Two failure modes bracket it. Too brittle:
the file enumerates cases ("if the file ends in _test.go, check assertion coverage…") and breaks on every
case the author didn't foresee. Too vague: "be helpful and accurate" constrains nothing, so the agent
falls back to its pre-training defaults.
| Too brittle | Right altitude |
|---|---|
"Never edit files in /src/auth/" | "Treat auth code as high-risk; any edit needs the session/token impact understood first" |
| "Always add a test for every function" | "Changes to business logic need test coverage; types, formatting, and comments do not" |
The practical test: if adding one instruction forces you to add three more to handle its edge cases, it was too brittle. Raise the altitude — state the principle, not the case. There's a measured reason to keep the list short, too: instruction-following accuracy degrades as rules accumulate, reaching only ~68% even for frontier models at a density of 500 simultaneous instructions.
Altitude is not always right. For hard security boundaries, enumerate explicitly — "never log
values from the credentials object" closes a gap that "treat sensitive data carefully" leaves open.
Same for regulated fixed-schema output and low-capability models that pattern-match better than they generalize.
AGENTS.md a ~100-line pointer map; push the actual knowledge into docs/.Retrieval practice — recall, don't peek
Question 1The recommended shape for AGENTS.md is…
Question 2"Treat auth code as high-risk; understand session impact first" is an example of…
Question 3One new instruction needing three more to patch its edge cases means it was…
Question 4For a hard security boundary, you should usually…
Question 5 · spaced recall from Lesson 01"The harness" in harness engineering means…
CLAUDE.md into a pointer map plus docs/? Next in Part 2:
Hooks — turning a "should-do" instruction into a "must-do" the model can't reason around.