Well-formatted prose with inline citations looks authoritative. Code that compiles looks correct. Neither feeling is evidence.
Why this, for you: the yes-man failure was the agent not checking itself; this is
you not checking the agent. It's the human half of the same loop. The fix is to verify against external
ground truth — and to know which outputs are worth the cost.
Agent output is seductively plausible. Fluent prose, clean formatting, confident tone — none of
these surface signals correlate reliably with correctness. The mistake is using output quality as a proxy for
accuracy.
1 What it looks like
You ask for email validation. The agent returns a clean function with a comment citing "RFC 5322 compliance":
# looks authoritative — cites a spec, compiles, reads clean
pattern = r"^[a-zA-Z0-9._%\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$"
# merged on sight. then, in production:
is_valid_email("user+tag@example.com") → False (a valid address)
The regex silently rejects +-tagged addresses — a subtle bug hidden behind authoritative
presentation. One assert against an edge case would have caught it before merge.
Output fluency is independent of correctness. Agents are most dangerous when almost
right — a fully wrong answer is easy to catch; a mostly correct one with a single subtle error propagates undetected.
2 Why it happens
Fluency, formatting, and confidence get mistaken for correctness. Agents are trained to produce coherent
responses — a separate objective from accuracy. Worse, users systematically overestimate LLM
accuracy when shown default explanations, and longer explanations inflate confidence further without
improving the answer (Steyvers et al., 2025).
3 The fix
Verify independently — not by re-reading the output, but by checking against external ground truth:
fetch cited URLs confirm the source exists and says what's claimed
run the code "compiles" and "correct" are different properties
cross-reference check official docs, not the agent's summary of them
review the diff diffs are easier to verify than full artifacts
# if it can be checked programmatically, automate it —# linters, type checkers, and tests are verification, not overhead.
Apply progressive trust: verify everything from new configurations, spot-check proven ones, and
always verify high-stakes outputs (security, production config, financial data) regardless of track record.
Don't over-verify either
Constant verification has a cost. Verification theater — running tests that don't cover the
change, then trusting a green suite — is motion without substance. Alert fatigue from over-firing
checks trains reviewers to dismiss real failures (the cry-wolf mode again). And applying production-auth scrutiny
to a throwaway script destroys the productivity benefit. The fix is calibrated verification, not
universal paranoia.
↪ Your win: check ground truth, not vibes
"Looks right" is not a verification method — fluency is independent of correctness.
Most dangerous when almost right — the subtle error is the one that ships.
Verify against external truth: fetch URLs, run code, cross-reference docs, review diffs.
Automate what you can — make verification the default, not the exception.
Calibrate by stakes — progressive trust, not theater or fatigue.
Retrieval practice — recall, don't peek
Question 1Output fluency, relative to correctness, is…
Question 2An agent is most dangerous when its answer is…
Question 3The right way to verify a cited source is to…
Question 4Running tests that don't cover the change is…
Question 5 · spaced recall from Lesson 06"Preserve tokens" backfires because it…
Ask me anything. Want a progressive-trust policy by output type, or how this pairs with the
separate-reviewer pattern from Lesson 04? Next, Part 3 turns to configuration: the six smells hiding in
your AGENTS.md.