← All courses

A Hands-On Course · 20 lessons

Harness Engineering

Design the agent loop itself — tools, hooks, sub-agents, permissions, and verification gates.

Short lessons (~5–8 min each), each with one tangible win and a retrieval-practice quiz. Built for engineers who already use AI coding tools and want the non-obvious mechanics.

Grounded in the agentpatterns.ai corpus (CC BY 4.0). Keep the Glossary open as you go.

Part 1 · The Loop

1 What a Harness Is The model is one part of the system. Everything around it — the loop, the tools, the rules — is the harness, and it decides more of your output quality than the model does. 2 Instruction Files & Altitude Your AGENTS.md is a context-budget decision, not documentation. Two levers govern it: how much it says, and at what altitude it says it.

Part 2 · Guardrails

3 Hooks — Deterministic Guardrails An instruction is a "should-do" the model can reason around. A hook is a "must-do" the harness runs no matter what the model decides. 4 Sub-Agents & Orchestration A sub-agent is a fresh context window you spend a task into and get only the answer back. The isolation is the feature — and the bill.

Part 3 · Safety & State

5 Permissions & Safety Boundaries The model can't reliably judge what it's allowed to touch. So don't make it. Put the boundary in the harness, where a misjudgment can't become an action. 6 Verification Gates An agent will tell you it's done. The harness's job is to not believe it — and to make "done" mean a deterministic check passed. 7 Long-Running Agents When a run outlives a context window, state has to live outside the model. The harness becomes a way to resume a task that no single session can hold.

Part 4 · Composition

8 Skills & Progressive Disclosure An agent definition is loaded on every invocation, whether the task needs it or not. Skills split the knowledge so only the slice the task requires ever enters the window. 9 Commands vs Agents Two concerns quietly collapse into one file: the workflow (what steps run, in what order) and the expertise (how to do the job well). Separate them and either can change without touching the other.

Part 5 · Reasoning & Planning

10 Plan Mode & Plan-First Fixing a plan costs minutes. Fixing an implementation costs context, tokens, and git reverts. A read-only phase makes the agent prove it understands before it can change a thing. 11 Reasoning Budget — The Sandwich Not every step needs the same depth of thought. Planning and verification are high-stakes; execution is mostly mechanical. Spend the compute where the ambiguity actually is.

Part 6 · Multi-Agent Loops

12 Orchestrator-Worker One lead agent decomposes a task, fans it out to parallel workers, and synthesizes the results. The parallelism is the win — and the 15× token bill is the catch. 13 Evaluator-Optimizer A generator produces, a critic judges, feedback recycles until a bar is met. The loop only works when the bar is machine-checkable — and only helps when the generator was weak to begin with.

Part 7 · Reversibility & Control

14 Reversibility & Idempotency Agents produce bad output — that's a given. The design question is what recovery costs when they do, and whether a re-run cleans up or compounds the mess. 15 Steering Running Agents A live run drifts toward the wrong file. You don't have to let it finish or throw it away — you can redirect it mid-flight and keep every bit of context it has already built.

Part 8 · Containment & Limits

16 Sandboxing & Blast-Radius Containment Permissions decide what an agent is allowed to do. Sandboxing decides how much damage it can do anyway — when the permission rule fails, the injection lands, or the model simply misbehaves. 17 Compaction Lesson 7 told a long-running agent to resume from a durable log instead of carrying everything in context. This is the move that keeps the in-session context worth carrying — replacing accumulated token mass with a dense summary before quality silently erodes. 18 Cost Controls & Circuit Breakers Lesson 12 flagged the open problem: an orchestrator can burn ~15× the tokens of a chat, and a stuck agent will loop until the window fills. Here's the harness that caps the spend and trips the stop — the limit nobody taught you to wire in.

Part 9 · Measuring the Harness

19 Eval-Driven Harness Improvement The whole course rests on one claim: the harness moves output quality more than the model. This is the lesson that makes the claim falsifiable — pin the model, change the environment, and read the score.

Capstone

20 The Harness Decision Table Nineteen lessons, one reflex: read the symptom, reach for the harness move. This is the whole course as a lookup table — and a mixed review to prove it stuck.