Harness Engineering · ~6 min
The model is one part of the system. Everything around it — the loop, the tools, the rules — is the harness, and it decides more of your output quality than the model does.
A coding agent is not just a model. It is a model running inside a loop: read context, call a tool, see the result, decide again, stop. The non-model code that mediates all of that — tools, context management, delegation, safety, orchestration — is the harness. Harness engineering is the discipline of designing it.
The surprising, repeatedly-measured finding is that you can hold the model fixed and change only the harness, and output quality moves a lot.
Anthropic, OpenAI, LangChain, and Martin Fowler's team each published findings converging on one conclusion: environment quality determines agent output quality more than model capability or prompt sophistication. Prompt engineering is a sub-part of the larger job — designing the loop the prompt lives in.
Harness engineering treats the repository as the agent's primary interface: if something is not in the repo, it does not exist for the agent. Three pillars make agents succeed by default.
| Pillar | What it means | Example |
|---|---|---|
| Legibility | The agent can find, read, and act on project knowledge | A compact AGENTS.md index pointing to deeper docs |
| Mechanical enforcement | Violation is impossible or immediately visible — not asked-for | A linter blocking a cross-layer import at the point of decision |
| Constrained solution spaces | Fewer valid architectures means fewer wrong ones | A dependency chain the tooling refuses to let you violate |
The next six lessons each take one slice of this: instruction files (legibility), hooks (mechanical enforcement), sub-agents, permissions, verification gates, and long-running state.
The mental shift that makes harness engineering pay off: when an agent struggles, the struggle is a signal about the environment, not a verdict on the agent. The agent imported the wrong module because nothing stopped it; the fix is to add the guardrail, the tool, or the doc — and feed it back into the repo.
This is what makes the harness worth engineering: a prompt fix helps one session; a harness fix helps all of them.
Linters and CI operate at the syntax and architecture layer, not the intent layer. Harness engineering reliably catches import violations and format errors; it does not reliably catch misdiagnosis, overengineering, or misunderstood instructions. And for prototypes or throwaway code, building custom linters and layered docs costs more than it returns — the investment pays off across many sessions, not one.
Retrieval practice — recall, don't peek
Question 1In harness engineering, "the harness" refers to…
Question 2LangChain's Terminal-Bench jump from 52.8% to 66.5% came from…
Question 3"If it's not in the repo, it doesn't exist for the agent" is the pillar of…
Question 4When an agent struggles, harness engineering reads it as…
Question 5The harness does not reliably catch which kind of problem?