The Harness Decision Table

Nineteen lessons, one reflex: read the symptom, reach for the harness move. This is the whole course as a lookup table — and a mixed review to prove it stuck.

Why this, for you: harness engineering is a habit, not a checklist. When an agent misbehaves, the instinct is to rewrite the prompt. This table retrains that instinct: most agent symptoms map to a specific, durable change in the environment — one that fixes the problem for every future session, not just this one.

The through-line of the whole course: agent failure is a signal about the environment. Diagnose the gap, make the smallest durable change, and bank it in the repo. Here's the diagnostic map.

1 Symptom → move

Symptom	Move	From
Agent can't find a convention it should know	Add a pointer in `AGENTS.md` to a `docs/` file — don't inline the whole thing	L2
Your instruction file keeps growing, one rule per mistake	Tag rules with source/applicability/expiry; raise altitude; prune on audit	L2
A rule is "usually followed" but must always hold	Promote it to a `PreToolUse` hook — exit 2 to block	L3
A long sub-task is flooding the main thread with noise	Delegate to a scoped sub-agent; only the result comes back	L4
The agent edited/deleted something outside its scope	Set the permission framework: allow/deny rules or a hard deny floor	L5
The agent declared "done" on broken code	Add a `Stop`-hook completion gate keyed to tests passing	L6
A wrong assumption cascaded through 500 lines	Verify per unit — fast `PostToolUse` typecheck between steps	L6
The task can't finish in one session	External done-condition + durable log + stateless harness; resume, don't restart	L7
A bloated agent definition loads on every run, mostly irrelevant	Progressive disclosure — tiny definition, detailed skills loaded on demand	L8
Pipeline edits keep forcing edits to the expert agent	Split the workflow into a command; keep expertise in the agent	L9
The agent dives into a multi-file change on a wrong assumption	Plan Mode — read-only explore + reviewable plan before any write	L10
Uniform max reasoning times out; uniform low misses risks	Reasoning sandwich — extra-high at plan/verify, high at execution	L11
One agent is slow on an independently-decomposable task	Orchestrator-worker — parallel workers, then synthesize; budget ~15× tokens	L12
Output needs iterative refinement against a checkable bar	Evaluator-optimizer loop with a round cap — but skip it on a strong baseline	L13
An agent mistake is expensive to undo, or a re-run duplicates state	Rollback-first + idempotency — one-command undo, check-before-act	L14
A live run has drifted onto the wrong file	Steer if recoverable; restart with a cleaner prompt if fundamentally wrong	L15
A permission rule fails, an injection lands, or the model misbehaves	Sandbox + scoped grant — bound the damage, not just the likelihood; microVM for untrusted code	L16
A long session keeps reasoning in the context dumb zone	Compact at the seams with a focus directive; lower the auto-trigger to 50–60%	L17
Spend runs up, or a stuck agent loops without progress	Route by complexity; trip a runtime circuit breaker (`maxTurns`, cost budget)	L18
You changed the harness but can't tell whether it helped	Ablate to rank subsystems, then hill-climb one variable against a held-out eval	L19

2 The one rule under all of it

Reach for the environment before the prompt — and pick the mechanism by how non-negotiable the rule is. Bendable guidance → an instruction. Must-hold, checkable → a deterministic hook, allow/deny rule, or gate. A prompt fix helps one session; a committed harness fix helps every session after it.

That's the difference between mechanical enforcement and hope. Hooks, permission rules, and verification gates are run by the harness at fixed points — the model gets no vote. Instruction files and altitude shape what the model tends to do; the deterministic layer decides what it can do.

The later parts add the same reflex at new layers. Composition (L8–L9): put each piece of knowledge where it's loaded only when needed — skills under an agent, workflow split from expertise. Reasoning & planning (L10–L11): spend exploration and compute where ambiguity is highest, before execution locks in. Multi-agent loops (L12–L13) and reversibility & control (L14–L15): scale out only when the task decomposes or the baseline is weak, and keep every action undoable, re-runnable, and steerable. Containment & limits (L16–L18): three boundaries the harness enforces by construction — a sandbox caps the blast radius, compaction caps context before it rots, a circuit breaker caps the spend. Measuring the harness (L19): pin the model, ablate to rank subsystems, hill-climb one variable against an eval — the loop that turns "environment beats model" from a claim into a number. Each is the same move — shape the environment, not just the prompt — applied at a different layer.

Don't over-build the harness

Every lesson had a backfire box, and they rhyme: the harness is investment that pays off across many sessions, not one. For prototypes, short-lived tools, and throwaway code, custom linters, layered docs, and resume machinery cost more than they return. Build the guardrail when the failure is recurring and the codebase is durable — not on reflex.

↪ Your win: a harness-engineering reflex

Read the symptom, reach for the move — the table above is the whole course in one glance.
Match mechanism to non-negotiability — instruction for bendable, hook/rule/gate for must-hold.
Bank every fix in the repo — turn each failure into a durable environment change.
Anchor "done" to deterministic signals — tests and exit codes, never self-report.
Right-size the investment — build guardrails for recurring failures on durable codebases, not prototypes.

Mixed review — across all nineteen lessons

Question 1 · from L1The core reframe of harness engineering is that agent failure is…

Question 2 · from L16Sandboxing contains an agent by bounding the…

Question 3 · from L17Manual compaction beats waiting for the 95% auto-trigger because by then the agent has…

Question 4 · from L18For a safety-critical stop, the most reliable circuit breaker is enforced by the…

Question 5 · from L19To measure a harness change cleanly, isometric ablation requires that you hold fixed the…

You finished the course. Ask me to apply the decision table to a real repo of yours, draft a starter settings.json (hooks + permission rules), or wire up a long-running harness with an initializer and a coding agent. Or revisit any lesson — the mechanics compound when you use them together.

The Harness Decision Table

1 Symptom → move

2 The one rule under all of it

Don't over-build the harness

↪ Your win: a harness-engineering reflex

Go deeper