Part 1 · The Trust Problem

Verifying Agent Work · ~7 min

Guardrails Beat Guidance

"Don't break any links" is a prompt — sometimes ignored. A link checker on every URL is a guardrail — it runs every time, and the model gets no vote.

Why this, for you: Lesson 1 said verify against ground truth. This is the mechanism. A guardrail is a deterministic check that passes or fails for every output, forever — the difference between hoping the agent behaves and enforcing a property of the result. It's the load-bearing layer under everything that follows.

Agents are probabilistic; they will sometimes produce bad output. Prompts guide behavior — probabilistic, occasionally skipped. Guardrails enforce output properties — deterministic, always run, cannot be reasoned around. Use both, but never confuse one for the other.

1 The core distinction

A property that can be checked programmatically and must always hold belongs in a guardrail, not a prompt. "Don't include broken links" in a system prompt is a suggestion; a pre-commit hook that curls every URL is a guarantee.

This is mechanical enforcement: making a violation impossible or immediately visible, rather than asked-for. Linters, structural tests, CI gates, and hooks run regardless of the model's choices.

2 Four categories of guardrail

# PostToolUse hook — fires after every Write, model gets no vote { "hooks": { "PostToolUse": [{ "matcher": "Write", "hooks": [{ "type": "command", "command": "ruff check $CLAUDE_TOOL_INPUT_PATH" }] }] } } # catches the issue file-by-file as the agent writes — not after the batch

3 Layer them — each catches what the last missed

No single guardrail catches everything. Pre-commit hooks catch obvious errors fast, before they enter the repo. CI gates catch integration errors that only manifest in the full build. Human review catches semantic errors no automated tool can assess. Each layer is independent.

But know the ceiling: guardrails check properties, not intent. A URL validator confirms a link resolves — not that it points to the claimed content. A linter confirms syntax, not logic. A schema validator confirms structure, not correctness. Design them specific: "matches the required schema with all fields present" beats "is valid YAML."

When guardrails backfire

Thin-but-visible coverage creates the impression of verification without covering what actually ships — false confidence. CI latency that dominates the loop pushes agents to batch fixes into larger, less reviewable diffs (move heavy checks to merge gates). Hook noise on legitimate exploratory work trains --no-verify bypass — and once bypass is normalized the guarantee is gone. And a guardrail can drift from the property it once protected; it needs the same maintenance as the code it guards.

↪ Your win: enforce properties, don't request them

Retrieval practice — recall, don't peek

Question 1The defining property of a guardrail versus a prompt is that it…

Question 2A schema validator confirms a structured output's…

Question 3Noisy hooks that fire on legitimate work tend to train operators to…

Question 4CI gates differ from pre-commit hooks mainly by being…

Question 5 · spaced recall from Lesson 1A checkpoint that reads the agent's "all tests pass" narration is…

Ask me anything. Want a starter .pre-commit-config.yaml plus a CI gate and a PostToolUse hook wired together, or help deciding which of your prompt rules should be promoted to guardrails? Next in Part 1: The Verification Ledger — turning every check into a row you can query.
✎ Feedback