Remember, Don't Re-Read

A long agent loop that replays its whole transcript every turn re-bills every prior observation on every call. Lift the record out of the transcript and the cost curve bends from quadratic to linear.

Why this, for you: if you build unattended loops — tuning, optimization, autonomous research — this is a cost lever that doesn't touch the model. A stateless ReAct loop pays O(n²) in tokens because every step re-sends every earlier observation. Move the experimental record to typed state outside the prompt and the same loop runs O(n). On the corpus benchmarks that's a 52–90% token cut at equal quality.

A stateless ReAct loop appends each Thought–Action–Observation triple to the message history, then re-sends the whole transcript on the next call. Per-call input grows linearly with step n; total cost across N steps is O(N²) — every prior observation re-billed on every subsequent inference.

1 The fix: state lives outside the prompt

State-carry lifts the experimental record out of the transcript into a typed object that lives outside the prompt. The agent reads specific fields via a tool only when the current decision needs them. The conversation window stays roughly fixed-size, and total cost across N steps becomes O(N).

The measured gap on two benchmarks (Jabbarvaziri, 2026): hyperparameter tuning over 15 iterations with small observations fell from 24,465 → 2,492 tokens (90%); code optimization over 40 iterations with large observations fell from 1,275K → 627K (52%). Optimization quality was comparable on both — the token cut did not degrade outcomes.

The saving is purely about where state lives, not about model reasoning. A stateless loop encodes the record in the transcript, which the inference call must re-process every turn. A stateful loop encodes it in an object outside the prompt, so per-call input is bounded by the working set the current step touches — not by cumulative history. The O(N²)→O(N) shift is a direct consequence of decoupling the record from the transcript.

2 How to apply it, framework-agnostic

The reference implementation uses LangGraph, but the pattern — state outside the prompt, reached by tool call — is framework-agnostic:

Step	What you build
Typed state object	The loop's record: best metric, last params, recent failure traces, working files — keep fields minimal
State-read tools	`read_state(field)`, `update_state(field, v)`, `list_recent_attempts(n)`
A checkpointer	Persist via Redis / Postgres / DynamoDB — not in-memory, which loses everything on restart
A trimmed window	Keep only recent turns in the transcript; the state object is the source of truth

3 When the simpler tool wins

This refactor pays back only under specific conditions: a long-horizon loop (tens of iterations), large per-iteration observations, running unattended in production, where the next decision usually needs only a subset of prior state. Miss any of those and reach for the cheaper layer.

Prompt caching attacks the same curve — for less work

Below ~10 iterations with a stable prefix, prompt caching (Lesson 07) already converts the dominant cost line to roughly O(1) for the static portion — Anthropic charges ~10% of input price on a cache hit, OpenAI ~50% — with none of the engineering of typed state and a checkpointer. The two are complementary at the boundary: a long stateful loop still benefits from a cached prefix on its residual transcript. They compete only for short loops with stable observations, where caching wins.

The other failure modes are operational, not theoretical. Schema churn — every added field is a migration surface, and a bloated state monolith has sunk projects. Concurrent writes without isolation corrupt shared state silently, surfacing several transitions downstream of the cause. And pruning the transcript to typed state discards the audit trail that made causal debugging and replay possible — keep the full trajectory when you need it.

↪ Your win: stop re-billing the transcript

Reach for state-carry on long, unattended loops with large observations — that's where O(N²) bites.
Expose state via read/update tools so the agent pulls only the subset each step needs.
Persist with a real checkpointer — Redis/Postgres/DynamoDB, never in-memory in restart-prone envs.
Reach for prompt caching first on short loops — it hits the same curve for far less work.
Keep the full transcript when you need causal debugging or replay; typed state drops the audit trail.

Retrieval practice — recall, don't peek

Question 1A stateless ReAct loop costs O(n²) tokens because each call…

Question 2State-carry bends the curve to O(n) by keeping the record…

Question 3For a short loop with a stable prefix, the cheaper fix is…

Question 4A real risk of typed state-carry is that it…

Question 5 · spaced recall from Lesson 21The recommended order for priming a coding agent is…

Ask me anything. Want a minimal typed-state schema for a tuning loop, or the decision rule for state-carry vs prompt caching at your iteration count? Next: The Anxious Agent — why a model rushes to finish near its limit, and the three moves that keep it from wrapping up early.