Seeing the Whole Run

Nine techniques, one job: make the agent and the system it touches legible — solo or fanned-out. Here's the decision table for which to reach for, and a mixed review across the whole course.

Why this, for you: the failure mode this course fights is the silent one — an agent that did the wrong thing and you can't see it, or a fix that regressed something you didn't check. This capstone collapses the course into a single chooser: given a symptom, which technique do you reach for?

Every lesson was a way to make something visible — the agent's trajectory, the system's behavior, the shape of a failure, and in Part 4 the work of which agent. Legibility is the through-line. The question in practice is never "which is best?" but "which one does this situation call for?"

1 The decision table

When you need to…	Reach for
Let the agent verify its own change worked	Three signal categories + verification ladder (L1)
Survive a context reset across sessions	Progress file + git trail; OTel for cost (L2)
Replay-verify and run concurrent agents safely	Event sourcing — append-only log as truth (L3)
Diagnose why the output is wrong	Four failure modes: see → told → do → tier (L4)
Stop a stuck or runaway agent	Edit-count + doom-loop + iteration cap / circuit breakers (L5)
Block a change that regresses behavior	Outcome-graded evals; trust the judge first (L6)
Find which source is filling the context window	Per-source attribution — rules/skills/MCP/subagent (L7)
Catch a wasted multi-agent run mid-trajectory	Failure-aware six-signal trace taxonomy (L8)
Query a fan-out trace by agent identity	Propagate `agent_id` on headers + spans (L9)

The recurring discipline across all nine: anchor "is it fixed?" to a deterministic signal — a test, a metric, a replayed log, a passing eval — never the model's own say-so. Every silent failure in this course traces back to trusting self-report over an external signal.

2 How the pieces compose

They aren't alternatives so much as a stack. A trajectory log (L2) gives the debugger (L4) something to read. Loop detection (L5) writes the same trail. Evals (L6) gate the trace that observability (L1) produced. And Part 4 lifts the whole stack to fan-out: attribution (L7) finds the bloated source, the six signals (L8) say why a run is failing, and a propagated agent_id (L9) tells you which agent owns it.

# a single failing run, observed end to end L1 signals → metric shows error rate didn't drop after the "fix" L2 progress → trail shows the agent already tried this approach last session L4 diagnose → /memory: a stale user-level rule conflicted L5 breaker → doom-loop would have stopped the 4th identical retry L6 eval gate → held-out suite blocks the regression before merge L8 six-signal → ToolErr=0.7 on a subagent flagged the waste mid-run L9 agent_id → the 429 storm traced to one subagent type, not the whole fan-out

The one trap that spans every lesson

Each technique has a backfire mode, and they rhyme: an MCP outage reads as "all clear" (L1), a stale metric misleads post-deploy (L1), compaction silently drops a rule (L4), a nudge eats the context it was meant to save (L5), an unreliable judge amplifies its own errors (L6), an input-only denominator under-reports a full window (L7), six signals open six false-positive surfaces (L8), and a shell-out drops the agent_id header (L9). The instrument can lie. Instrument first, then add the intervention where data shows a real problem — never prophylactically.

↪ Your win: the whole course, one principle

Make it legible — the agent's run, the system's behavior, and which agent did the work.
Anchor "done" to a deterministic signal, never the model's self-report.
Match technique to symptom using the decision table — nine lessons compose into one stack.
Trust your instruments last — every one has a backfire mode that reads as success.
Scale the stack to fan-out — attribute the source, name the failure, query by agent_id.

Mixed review — the whole course, recall don't peek

Question 1 · Lesson 01For a functional UI check, the cheaper, model-readable signal is…

Question 2 · Lesson 02In Claude Code OTel, prompt.id exists to…

Question 3 · Lesson 07Per-source context attribution exists so an operator can…

Question 4 · Lesson 08The six-signal failure taxonomy is best understood as…

Question 5 · Lesson 09In a fan-out trace, agent_id belongs on spans, not metrics, because…

Ask me anything. Want to walk one of your own stuck sessions through the decision table, or wire a progress file plus a loop-detection hook into your harness today? That's the course — go make a run legible.

Seeing the Whole Run

1 The decision table

2 How the pieces compose

The one trap that spans every lesson

↪ Your win: the whole course, one principle

Go deeper