What a Harness Is

The model is one part of the system. Everything around it — the loop, the tools, the rules — is the harness, and it decides more of your output quality than the model does.

Why this, for you: when an agent fails, your instinct is to blame the model or rewrite the prompt. This lesson reframes the whole course: the highest-leverage edits are usually to the environment the model runs in — and those edits compound across every future session.

A coding agent is not just a model. It is a model running inside a loop: read context, call a tool, see the result, decide again, stop. The non-model code that mediates all of that — tools, context management, delegation, safety, orchestration — is the harness. Harness engineering is the discipline of designing it.

1 The headline result: environment beats model

The surprising, repeatedly-measured finding is that you can hold the model fixed and change only the harness, and output quality moves a lot.

LangChain took Terminal-Bench 2.0 from 52.8% to 66.5% through pure harness changes — no model change. OpenAI shipped roughly a million lines of production code in a five-month experiment with no manually written source; the enabler was environment design, not a smarter model.

Anthropic, OpenAI, LangChain, and Martin Fowler's team each published findings converging on one conclusion: environment quality determines agent output quality more than model capability or prompt sophistication. Prompt engineering is a sub-part of the larger job — designing the loop the prompt lives in.

2 The three pillars

Harness engineering treats the repository as the agent's primary interface: if something is not in the repo, it does not exist for the agent. Three pillars make agents succeed by default.

Pillar	What it means	Example
Legibility	The agent can find, read, and act on project knowledge	A compact `AGENTS.md` index pointing to deeper docs
Mechanical enforcement	Violation is impossible or immediately visible — not asked-for	A linter blocking a cross-layer import at the point of decision
Constrained solution spaces	Fewer valid architectures means fewer wrong ones	A dependency chain the tooling refuses to let you violate

The next six lessons each take one slice of this: instruction files (legibility), hooks (mechanical enforcement), sub-agents, permissions, verification gates, and long-running state.

3 The feedback signal: failure is diagnostic

The mental shift that makes harness engineering pay off: when an agent struggles, the struggle is a signal about the environment, not a verdict on the agent. The agent imported the wrong module because nothing stopped it; the fix is to add the guardrail, the tool, or the doc — and feed it back into the repo.

# the loop that compounds agent struggles → diagnose the gap → add a tool / guardrail / doc → commit it to the repo → every future session inherits the fix

This is what makes the harness worth engineering: a prompt fix helps one session; a harness fix helps all of them.

Where the harness won't save you

Linters and CI operate at the syntax and architecture layer, not the intent layer. Harness engineering reliably catches import violations and format errors; it does not reliably catch misdiagnosis, overengineering, or misunderstood instructions. And for prototypes or throwaway code, building custom linters and layered docs costs more than it returns — the investment pays off across many sessions, not one.

↪ Your win: edit the environment, not just the prompt

Separate the two layers — the model, and the harness it runs in. Most "the model is dumb" problems are harness problems.
Reach for the environment first — a tool, a guardrail, or a doc usually beats a longer prompt.
Treat the repo as the interface — if it's not in the repo, it doesn't exist for the agent.
Bank every fix — turn each failure into a committed harness change so all future sessions inherit it.

Retrieval practice — recall, don't peek

Question 1In harness engineering, "the harness" refers to…

Question 2LangChain's Terminal-Bench jump from 52.8% to 66.5% came from…

Question 3"If it's not in the repo, it doesn't exist for the agent" is the pillar of…

Question 4When an agent struggles, harness engineering reads it as…

Question 5The harness does not reliably catch which kind of problem?

Ask me anything. Want to see the three pillars applied to your own repo, or why "environment beats prompt" follows from how attention and tooling interact? Next in Part 1: Instruction Files & Altitude — making the repo legible without drowning the agent in rules.