The Infinite Context

The first instinct when an agent fails is to give it more. More files, more history, more docs. That instinct is the anti-pattern.

Why this, for you: this is the failure underneath half the others in this course. Learn to recognise it first — an agent that ignores instructions it followed at shorter lengths, produces generic output, or "notices" unrelated work to do — and you've got a diagnosis you'll reach for every day.

The pattern is seductive because it sounds safe: load every potentially relevant file, the full conversation history, all the docs, complete tool results. More information can't hurt. It can, and it does.

1 What it looks like

A coding agent is asked to fix one failing test in a 200-file repo. The developer pre-loads the whole codebase with @workspace, attaches 50 prior turns, and pastes three git log dumps — 180,000 tokens. The agent finds the test, then rewrites an unrelated module it "noticed," ignores the fix-only constraint from turn 1, and ships a 12-file diff. Roll back, retry with just the failing test and its two imports in a fresh session, and it fixes the test in one turn.

A larger context window does not produce better output. The fix was never in the extra context — the extra context is why the fix failed.

2 Why it happens

Attention is finite even when the window is not. Anthropic's context-engineering guide names this context rot: recall and use of information degrade as token count grows. Irrelevant context is not inert — it adds noise that competes with signal for a fixed attention budget, and tokens in the middle of a long prompt receive the least attention of all (the lost-in-the-middle effect).

The goal is the smallest high-signal set

Anthropic's framing: find "the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome" — not the largest set you can fit. Volume is the wrong objective.

3 The fix

Replace volume with precision. Each lever trades a chunk of always-loaded context for on-demand access:

# instead of: @workspace + 50 turns + 3 git logs (180K tokens) load on-demand — fetch reference material when the step needs it, not at startup skill metadata — expose descriptions; load full content only on invoke compact history — summarise prior turns instead of accumulating sub-agents — delegate retrieval; coordinator gets only the result prune results — store big outputs externally, pass a summary + path

LangChain's Deep Agents applies these in sequence as pressure rises — offload large results, truncate older tool calls, summarise history — rather than all at once.

When loading more is correct

The remediation loses under specific conditions: unreliable retrieval (poor recall beats RAG with high miss rates), a tight latency budget (on-demand adds round-trips), truly homogeneous context (a whole-repo rename has nothing irrelevant to exclude), and short tasks where sub-agent orchestration overhead isn't worth it.

↪ Your win: precision over volume

Diagnose by the symptom: instructions ignored at length, generic output, unrelated work appearing.
Aim for the smallest high-signal set, not the biggest window you can fill.
Load on-demand, compact, and isolate with sub-agents — three independent levers.
Preloading "just in case" is the most common cause; it's almost never the cheapest.
Know the exceptions: noisy retrieval, tight latency, homogeneous context flip the calculus.

Retrieval practice — recall, don't peek

Question 1A larger context window, on its own, tends to…

Question 2Anthropic's stated goal for context is the…

Question 3The most common cause of this anti-pattern is…

Question 4Loading more context can be the right call when…

Question 5Tokens in the middle of a long prompt are…

Ask me anything. Want the sub-agent isolation pattern that keeps a coordinator's context clean, or how on-demand retrieval can itself misfire? Next in Part 1: The Kitchen Sink Session — the same dilution, but caused by mixing unrelated tasks in one session.