Observability · ~7 min
Every iteration of a stuck agent looks like progress from the inside. Three layers — edit-count, doom-loop, iteration cap — stop it before the context window burns.
Agents enter micro-loops: edit a file, run tests, see the same failure, edit the same file, see the same failure. Each pass looks like forward progress from the inside. Without intervention the agent exhausts its context retrying an approach that does not work.
Count edits per file path within a session. Past a threshold, inject a factual nudge — state the observation, don't dictate the fix. LangChain credits this class of harness-level intervention with moving their agent from rank 30 to rank 5 on Terminal Bench 2.0 without changing the model.
Edit-count misses a distinct mode: the same tool call returning the same error. Stack three detectors.
| Layer | Catches |
|---|---|
| Edit-count tracking | Repetitive editing of the same file |
| Doom-loop detection | Identical tool-call / error pairs — stops, doesn't nudge |
| Iteration cap | All remaining runaway execution, regardless of pattern |
Doom-loop detection terminates iteration rather than nudging, because identical failures will not self-resolve. The iteration cap is the backstop: pattern detectors miss non-identical but equally unproductive loops, so a hard per-conversation cap catches the rest.
Loop detection is intra-session. Circuit breakers are the broader stopping family: halt when progress stalls, return partial results, explain the stop.
maxTurns is runtime-enforced and cannot be overridden by model
reasoning. Instruction-level checks (signals 2, 3, 5) depend on the model obeying its own rules — fine for nudges,
unreliable for safety-critical stops. For those, prefer runtime enforcement or hooks.When a breaker trips, degrade gracefully: stop new actions, return the partial results already completed, explain what triggered the stop and what remains. Partial work beats a discarded session.
Across 220 instrumented runs, only half of 12 automated loop interventions actually reduced their target signal —
and one generated 13× more signal than it suppressed by tripping its own detector. False
positives fire on tight refactors and post-edit re-reads; every nudge eats context; a too-low maxTurns
cuts off legitimate 50+-turn work. Instrument first, add breakers where real loops show — not
prophylactically.
maxTurns for safety-critical stops — runtime beats instruction-level.Retrieval practice — recall, don't peek
Question 1A loop-detection nudge should…
Question 2Doom-loop detection targets…
Question 3For a safety-critical stop, prefer…
Question 4When a circuit breaker trips, the agent should…
Question 5 · spaced recall from Lesson 04The four agent-failure modes are missing context, conflicting instructions, missing tools, and…
settings.json, or how the
intra-session loop differs from the cross-session Ralph Wiggum loop (fresh context, same failed approach)? Next in
Part 3: Gates That Catch Regressions — evals.