Part 1 · The Loop

Harness Engineering · ~7 min

Instruction Files & Altitude

Your AGENTS.md is a context-budget decision, not documentation. Two levers govern it: how much it says, and at what altitude it says it.

Why this, for you: the instruction file is the first thing every session loads and the easiest thing to get wrong. People grow it one rule at a time until it quietly poisons every session. Two disciplines — keep it small, pitch it at the right altitude — fix that for good.

Legibility was Pillar 1. The instruction file (AGENTS.md, CLAUDE.md) is its front door. But more instruction is not more legible — past a point it's the opposite.

1 A table of contents, not an encyclopedia

The OpenAI Harness team named "one big AGENTS.md" as an early failure mode with four consequences: context crowding (the file steals budget from the task), attention dilution (when every rule is present, none is prominent), unverifiable scope (no one can tell what's still current), and instant rot (piecemeal edits accumulate contradictions).

Keep AGENTS.md to roughly 100 lines as a pointer map — what the project is, where conventions live, what to read first per task. The knowledge itself lives in a versioned docs/ directory the agent pulls on demand.

And the bloat has a name. The lecture that popularized this calls out the ratchet directly: "agent makes a mistake, you say 'add a rule to prevent this', add it to AGENTS.md, it works temporarily, agent makes a different mistake, add another rule, repeat, file bloats out of control." Each addition is one-way because no one can tell what's safe to delete.

Even good instruction files have a cost

An ETH Zürich study of 138 Python tasks found repository context files — even human-written ones — raised the number of agent steps by 19–20%, and LLM-generated context files cut task success by ~3% versus no file at all. The instruction file is a resource-allocation decision. Spend the budget only on what the agent can't discover itself.

2 Altitude: how to reason, not what to decide

The second lever is the level a rule is written at. Two failure modes bracket it. Too brittle: the file enumerates cases ("if the file ends in _test.go, check assertion coverage…") and breaks on every case the author didn't foresee. Too vague: "be helpful and accurate" constrains nothing, so the agent falls back to its pre-training defaults.

The right altitude tells the agent how to reason, not what to decide for each case — strong heuristics that generalize to inputs you never anticipated. Anthropic calls it the "Goldilocks zone."
Too brittleRight altitude
"Never edit files in /src/auth/""Treat auth code as high-risk; any edit needs the session/token impact understood first"
"Always add a test for every function""Changes to business logic need test coverage; types, formatting, and comments do not"

The practical test: if adding one instruction forces you to add three more to handle its edge cases, it was too brittle. Raise the altitude — state the principle, not the case. There's a measured reason to keep the list short, too: instruction-following accuracy degrades as rules accumulate, reaching only ~68% even for frontier models at a density of 500 simultaneous instructions.

When to drop to enumeration on purpose

Altitude is not always right. For hard security boundaries, enumerate explicitly — "never log values from the credentials object" closes a gap that "treat sensitive data carefully" leaves open. Same for regulated fixed-schema output and low-capability models that pattern-match better than they generalize.

↪ Your win: a small file, pitched high, that prunes itself

Retrieval practice — recall, don't peek

Question 1The recommended shape for AGENTS.md is…

Question 2"Treat auth code as high-risk; understand session impact first" is an example of…

Question 3One new instruction needing three more to patch its edge cases means it was…

Question 4For a hard security boundary, you should usually…

Question 5 · spaced recall from Lesson 01"The harness" in harness engineering means…

Ask me anything. Want the source/applicability/expiry annotation format for self-pruning rules, or help splitting a bloated CLAUDE.md into a pointer map plus docs/? Next in Part 2: Hooks — turning a "should-do" instruction into a "must-do" the model can't reason around.
✎ Feedback