Cost Controls & Circuit Breakers

Lesson 12 flagged the open problem: an orchestrator can burn ~15× the tokens of a chat, and a stuck agent will loop until the window fills. Here's the harness that caps the spend and trips the stop — the limit nobody taught you to wire in.

Why this, for you: the multi-agent lessons celebrated fan-out and iteration but left a hole — what stops a worker that's looping, or a cascade of agents that quietly runs up the bill? Cost is the third limit, after blast radius and context. This lesson gives you both halves: routing each task to the cheapest model that meets it, and a circuit breaker that halts a loop the model can't stop on its own.

A cost control routes work to the cheapest tier that meets the task and caps the budget it can spend. A circuit breaker halts an agent loop when progress stalls — repeated errors, runaway cost, context exhaustion, or circular behavior. One bounds spend by design; the other bounds it by detection.

1 Route by complexity, escalate on failure

Model cost scales with tier and token volume. Top-tier models on every task waste compute; cheap models on complex tasks produce rework. The fix is to match capability to the task and pay up only when you must.

Task	Tier
File search, exploration — high volume, low reasoning	Fast (e.g. Haiku)
Code implementation — balanced capability and speed	Balanced (e.g. Sonnet)
Architecture, complex refactoring — deep reasoning	Powerful (e.g. Opus)

The savings are large and measured. A community analysis of the big.LITTLE pattern — Haiku for read-only exploration while a stronger model reasons — reports a 2–2.5× cost reduction at 85–95% quality on mixed workloads. FrugalGPT's cascade (query the cheap model first, escalate only on low confidence) demonstrated up to 98% cost reduction. The cascade isn't native to coding tools, but you approximate it: fast model, then a deterministic gate — tests, linter, type checker — and escalate to the capable model only on failure.

This rhymes with the reasoning sandwich of Lesson 11: spend the expensive resource where ambiguity is highest. Here the resource is dollars instead of reasoning tokens, but the discipline is identical — escalate on a cheap, deterministic signal, never on habit.

2 The breaker: five signals to stop a loop

Routing bounds the cost of healthy work. A stuck agent isn't healthy — it applies the same failed fix, retries a flaky test twenty times, consumes resources without progress until the window fills or the session is killed. A circuit breaker watches for the stall and halts it.

Signal	Trips when…
Iteration limit	The agent has taken N steps without completing — `maxTurns` enforces it at the runtime level
Repeated failure	The same call fails the same way — a 429 three times running will keep 429-ing
Repetition	The agent re-fetches a URL or re-reads a file with no new information — a stuck loop
Context budget	The window approaches the dumb zone of Lesson 17 — trip on dropping recall, not a fixed count
Cost threshold	Spend exceeds the expected budget — overrun often correlates with looping

When a breaker trips, degrade gracefully: stop new actions, return the partial results already completed, explain what triggered the stop and what remains, and escalate to a human if the pipeline has a gate. Partial results are more useful than nothing — never discard completed work.

3 Runtime stops beat instruction stops

Where the breaker is enforced decides whether it can be ignored. maxTurns and session cost budgets are enforced at the runtime — the model gets no vote, exactly like the hooks and permission rules of Parts 2–3. Error-rate and repetition checks written as agent instructions depend on the model reading and obeying its own rules; if it ignores them mid-reasoning, the stop never fires. Hooks sit in between: deterministic scripts that monitor and trigger.

# research-agent frontmatter — runtime-enforced ceiling maxTurns: 20 # cannot be overridden by model reasoning # system-prompt rules — model-dependent, for what runtime can't see - same URL errors 3× in a row -> skip it, note unreachable, move on - about to re-fetch a URL already fetched -> stop, return what you have # safety-critical stops belong at the runtime, not in the prompt

This is also the kill path Lesson 16 demanded. A narrowly-scoped agent still accumulates time-integrated damage between detection and shutdown — and a Kiteworks 2026 report found 60% of organizations can't terminate a misbehaving agent at all. A runtime-enforced breaker is the termination path the agent itself cannot block, closing the loop sandboxing left open.

Set too aggressively, the breaker becomes the failure

Circuit breakers are failure-mode detectors, not correctness guarantees. A low maxTurns cuts off legitimate multi-step refactors — production frameworks have open issues where agents halt mid-task on "max iterations" while still making progress. Naive repetition checks fire on valid re-reads (re-reading a file after an edit, refetching after a 429 backoff). A hard cost cap trips on a successful exploration run as readily as on a loop — the signal is cost without progress, not cost alone. The steelman: if your agents already fail gracefully, another stopping layer mostly adds false positives. Instrument first; add breakers where instrumentation shows real loops, not prophylactically.

↪ Your win: cap the spend, trip the stall

Route by complexity — fast for exploration, balanced for implementation, powerful for architecture.
Escalate on a deterministic gate — cheap model first, tests/linter/type-check, then escalate on failure.
Wire five stop signals — iteration, repeated failure, repetition, context budget, cost threshold.
Prefer runtime enforcement for safety stops — maxTurns and cost budgets the model can't override.
Degrade gracefully — return partial results and what triggered the stop; never discard completed work.

Retrieval practice — recall, don't peek

Question 1Cost-aware routing sends high-volume, low-reasoning exploration to the…

Question 2Cascade routing escalates to the capable model only after…

Question 3Which stop signal is enforced at the runtime, not by instruction?

Question 4When a circuit breaker trips, the agent should…

Question 5 · spaced recall from Lesson 12Orchestrator-worker fan-out is worth its ~15× token cost only when subtasks are…

Ask me anything. Want a routing config that pins Haiku to exploration and Sonnet to implementation, or a breaker setup that pairs maxTurns with a cost budget for a worker pool? Next, the Capstone: a symptom→move table that folds all nineteen lessons into one decision tool.

✎ Feedback