Context Engineering · ~7 min
A long agent loop that replays its whole transcript every turn re-bills every prior observation on every call. Lift the record out of the transcript and the cost curve bends from quadratic to linear.
A stateless ReAct loop appends each Thought–Action–Observation triple to the message history, then re-sends the whole transcript on the next call. Per-call input grows linearly with step n; total cost across N steps is O(N²) — every prior observation re-billed on every subsequent inference.
State-carry lifts the experimental record out of the transcript into a typed object that lives outside the prompt. The agent reads specific fields via a tool only when the current decision needs them. The conversation window stays roughly fixed-size, and total cost across N steps becomes O(N).
The saving is purely about where state lives, not about model reasoning. A stateless loop encodes the record in the transcript, which the inference call must re-process every turn. A stateful loop encodes it in an object outside the prompt, so per-call input is bounded by the working set the current step touches — not by cumulative history. The O(N²)→O(N) shift is a direct consequence of decoupling the record from the transcript.
The reference implementation uses LangGraph, but the pattern — state outside the prompt, reached by tool call — is framework-agnostic:
| Step | What you build |
|---|---|
| Typed state object | The loop's record: best metric, last params, recent failure traces, working files — keep fields minimal |
| State-read tools | read_state(field), update_state(field, v), list_recent_attempts(n) |
| A checkpointer | Persist via Redis / Postgres / DynamoDB — not in-memory, which loses everything on restart |
| A trimmed window | Keep only recent turns in the transcript; the state object is the source of truth |
This refactor pays back only under specific conditions: a long-horizon loop (tens of iterations), large per-iteration observations, running unattended in production, where the next decision usually needs only a subset of prior state. Miss any of those and reach for the cheaper layer.
Below ~10 iterations with a stable prefix, prompt caching (Lesson 07) already converts the dominant cost line to roughly O(1) for the static portion — Anthropic charges ~10% of input price on a cache hit, OpenAI ~50% — with none of the engineering of typed state and a checkpointer. The two are complementary at the boundary: a long stateful loop still benefits from a cached prefix on its residual transcript. They compete only for short loops with stable observations, where caching wins.
The other failure modes are operational, not theoretical. Schema churn — every added field is a migration surface, and a bloated state monolith has sunk projects. Concurrent writes without isolation corrupt shared state silently, surfacing several transitions downstream of the cause. And pruning the transcript to typed state discards the audit trail that made causal debugging and replay possible — keep the full trajectory when you need it.
Retrieval practice — recall, don't peek
Question 1A stateless ReAct loop costs O(n²) tokens because each call…
Question 2State-carry bends the curve to O(n) by keeping the record…
Question 3For a short loop with a stable prefix, the cheaper fix is…
Question 4A real risk of typed state-carry is that it…
Question 5 · spaced recall from Lesson 21The recommended order for priming a coding agent is…