Observability · ~9 min
Nine techniques, one job: make the agent and the system it touches legible — solo or fanned-out. Here's the decision table for which to reach for, and a mixed review across the whole course.
Every lesson was a way to make something visible — the agent's trajectory, the system's behavior, the shape of a failure, and in Part 4 the work of which agent. Legibility is the through-line. The question in practice is never "which is best?" but "which one does this situation call for?"
| When you need to… | Reach for |
|---|---|
| Let the agent verify its own change worked | Three signal categories + verification ladder (L1) |
| Survive a context reset across sessions | Progress file + git trail; OTel for cost (L2) |
| Replay-verify and run concurrent agents safely | Event sourcing — append-only log as truth (L3) |
| Diagnose why the output is wrong | Four failure modes: see → told → do → tier (L4) |
| Stop a stuck or runaway agent | Edit-count + doom-loop + iteration cap / circuit breakers (L5) |
| Block a change that regresses behavior | Outcome-graded evals; trust the judge first (L6) |
| Find which source is filling the context window | Per-source attribution — rules/skills/MCP/subagent (L7) |
| Catch a wasted multi-agent run mid-trajectory | Failure-aware six-signal trace taxonomy (L8) |
| Query a fan-out trace by agent identity | Propagate agent_id on headers + spans (L9) |
They aren't alternatives so much as a stack. A trajectory log (L2) gives the debugger (L4) something to read. Loop
detection (L5) writes the same trail. Evals (L6) gate the trace that observability (L1) produced. And Part 4 lifts the
whole stack to fan-out: attribution (L7) finds the bloated source, the six signals (L8) say why a run is
failing, and a propagated agent_id (L9) tells you which agent owns it.
Each technique has a backfire mode, and they rhyme: an MCP outage reads as "all clear" (L1), a stale metric
misleads post-deploy (L1), compaction silently drops a rule (L4), a nudge eats the context it was meant to save
(L5), an unreliable judge amplifies its own errors (L6), an input-only denominator under-reports a full window (L7),
six signals open six false-positive surfaces (L8), and a shell-out drops the agent_id header (L9). The
instrument can lie. Instrument first, then add the intervention where data shows a real problem —
never prophylactically.
agent_id.Mixed review — the whole course, recall don't peek
Question 1 · Lesson 01For a functional UI check, the cheaper, model-readable signal is…
Question 2 · Lesson 02In Claude Code OTel, prompt.id exists to…
Question 3 · Lesson 07Per-source context attribution exists so an operator can…
Question 4 · Lesson 08The six-signal failure taxonomy is best understood as…
Question 5 · Lesson 09In a fan-out trace, agent_id belongs on spans, not metrics, because…