Part 1 · The Trust Problem

Verifying Agent Work · ~7 min

The Confident Liar

"Build passed. Tests green. No issues found." The agent is fluent, confident, and — often enough to matter — wrong. This course is about never taking its word for it.

Why this, for you: the single most expensive habit in agent-assisted work is trusting a self-reported "done." Everything that follows — guardrails, ledgers, evals — exists to replace the agent's say-so with evidence you can check. This first lesson is the reframe the whole course hangs on.

Agent output is seductively plausible. Well-formatted prose with inline citations looks authoritative; code that compiles looks correct. None of those surface signals correlate reliably with correctness — fluency is a separate objective from accuracy.

1 Polish is not proof

The mistake is using output quality as a proxy for accuracy. An agent produces hallucinated URLs, fabricated statistics, and plausible-but-wrong claims — all in grammatically perfect prose. The danger peaks when the agent is almost right: a fully wrong answer is easy to catch; a mostly-correct answer with one subtle error propagates undetected.

The persuasion paradox

Users systematically overestimate LLM accuracy when shown default explanations, and longer explanations inflate confidence further without improving the answer (Steyvers et al., 2025, Nature Machine Intelligence). A follow-up found the same in human-AI teams: fluent explanations raised confidence with no accuracy gain beyond the prediction alone. The explanation persuades you; it doesn't verify the work.

2 Self-report is unfalsifiable

"Build passed. Tests green." is a claim made inside the conversation — and the conversation has no way to check itself. The agent may hallucinate that checks passed, skip steps silently, or assert a result without ever running the tool. Practitioners report agents that claim fixes for code that was never changed and insist tests pass when the transcript shows failures.

A checkpoint that reads the agent's own narration — "I fixed the bug," "all tests pass" — is not a checkpoint. Verification means checking against external ground truth, not re-reading the output.
# the trap: trusting the prose Agent: "Done — added email validation, RFC 5322 compliant. Tests pass." # the check: run it against an edge case assert is_valid_email("user+tag@example.com") → AssertionError # the regex silently drops +tagged addresses. "Looks right" missed it.

3 Verify against ground truth, not the output

The fix is not to re-read what the agent wrote — it's to check it against something outside the agent:

If a property can be checked programmatically, check it automatically. Linters, type checkers, and test suites are verification, not overhead. The rest of this course is how to make that checking mechanical — run by the harness, not hoped for from the model.

Don't swing to universal paranoia

Constant verification has a cost. Applying production-auth scrutiny to a throwaway script destroys the productivity benefit. The failure on the other side is verification theater — running tests that don't cover the change, then treating a green suite as ground truth. The fix is calibrated verification: always verify high-stakes, irreversible, security-critical output; spot-check the proven; automate the rest.

↪ Your win: treat "done" as a claim to be checked

Retrieval practice — recall, don't peek

Question 1Polished, fluent agent output correlates with…

Question 2An agent is most dangerous when its answer is…

Question 3A checkpoint that only reads the agent's "all tests pass" narration is…

Question 4Longer agent explanations were shown to…

Question 5To verify a cited claim, the right move is to…

Ask me anything. Want a quick audit of where your current workflow trusts a self-reported "done," or a checklist for which outputs deserve full verification versus a spot-check? Next in Part 1: Guardrails Beat Guidance — making the check deterministic so the model gets no vote.
✎ Feedback