Trust Without Verify

Well-formatted prose with inline citations looks authoritative. Code that compiles looks correct. Neither feeling is evidence.

Why this, for you: the yes-man failure was the agent not checking itself; this is you not checking the agent. It's the human half of the same loop. The fix is to verify against external ground truth — and to know which outputs are worth the cost.

Agent output is seductively plausible. Fluent prose, clean formatting, confident tone — none of these surface signals correlate reliably with correctness. The mistake is using output quality as a proxy for accuracy.

1 What it looks like

You ask for email validation. The agent returns a clean function with a comment citing "RFC 5322 compliance":

# looks authoritative — cites a spec, compiles, reads clean pattern = r"^[a-zA-Z0-9._%\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$" # merged on sight. then, in production: is_valid_email("user+tag@example.com") → False (a valid address)

The regex silently rejects +-tagged addresses — a subtle bug hidden behind authoritative presentation. One assert against an edge case would have caught it before merge.

Output fluency is independent of correctness. Agents are most dangerous when almost right — a fully wrong answer is easy to catch; a mostly correct one with a single subtle error propagates undetected.

2 Why it happens

Fluency, formatting, and confidence get mistaken for correctness. Agents are trained to produce coherent responses — a separate objective from accuracy. Worse, users systematically overestimate LLM accuracy when shown default explanations, and longer explanations inflate confidence further without improving the answer (Steyvers et al., 2025).

3 The fix

Verify independently — not by re-reading the output, but by checking against external ground truth:

fetch cited URLs confirm the source exists and says what's claimed run the code "compiles" and "correct" are different properties cross-reference check official docs, not the agent's summary of them review the diff diffs are easier to verify than full artifacts # if it can be checked programmatically, automate it — # linters, type checkers, and tests are verification, not overhead.

Apply progressive trust: verify everything from new configurations, spot-check proven ones, and always verify high-stakes outputs (security, production config, financial data) regardless of track record.

Don't over-verify either

Constant verification has a cost. Verification theater — running tests that don't cover the change, then trusting a green suite — is motion without substance. Alert fatigue from over-firing checks trains reviewers to dismiss real failures (the cry-wolf mode again). And applying production-auth scrutiny to a throwaway script destroys the productivity benefit. The fix is calibrated verification, not universal paranoia.

↪ Your win: check ground truth, not vibes

"Looks right" is not a verification method — fluency is independent of correctness.
Most dangerous when almost right — the subtle error is the one that ships.
Verify against external truth: fetch URLs, run code, cross-reference docs, review diffs.
Automate what you can — make verification the default, not the exception.
Calibrate by stakes — progressive trust, not theater or fatigue.

Retrieval practice — recall, don't peek

Question 1Output fluency, relative to correctness, is…

Question 2An agent is most dangerous when its answer is…

Question 3The right way to verify a cited source is to…

Question 4Running tests that don't cover the change is…

Question 5 · spaced recall from Lesson 06"Preserve tokens" backfires because it…

Ask me anything. Want a progressive-trust policy by output type, or how this pairs with the separate-reviewer pattern from Lesson 04? Next, Part 3 turns to configuration: the six smells hiding in your AGENTS.md.