Context Engineering · ~5 min

The Dumb Zone

Your agent gets dumber long before it runs out of room — and the safety net fires too late to help.

Why this, for you: the fastest daily-coding win. Knowing when a session has silently entered the dumb zone — and compacting before it does — is the difference between a clean diff and an agent that "starts repeating itself and missing obvious patterns."

Here's the claim most people get wrong. Ask an experienced engineer "when does a long context start hurting quality?" and they'll say something proportional — "past halfway," "when it's nearly full."

Degradation onset is closer to an absolute token threshold (~32K–100K) than a fixed percentage of the window — and it's a gradient, not a cliff. A bigger advertised window does not push the dumb zone proportionally later.

RULER tested 17 models: Yi-34B claims 200K but has ~32K of effective context (16%); GPT-4 claims 128K and reaches ~64K (50%). The Chroma study confirmed all 18 frontier models tested — Opus 4, GPT-4.1, Gemini 2.5 Pro included — degrade with input length. Anthropic calls it "context rot". In this workspace we call it the dumb zone.

It's not uniform — task type sets the threshold

The single most useful refinement: reasoning degrades fastest, retrieval is most resilient. Budget by task, not by one percentage rule.

Task type	Effective context	So…
Reasoning (planning, architecture)	10–20% of window	Keep under ~32K where you can
Multi-hop / semantic retrieval	16–50% of window	Prefer similarity over stuffing
Simple lookup (needle-in-haystack)	>99% recall, very deep	Tolerates large loads — but misleads
Code bug-fixing	Collapses fast*	Test at your real context length

*Claude 3.5 Sonnet on LongCodeBench: 29% at 32K → 3% at 256K. And total context counts everything — system prompt, instructions, skill defs, history — not just your task tokens.

The gap that bites you

Claude Code's auto-compaction fires at ~95% fill. But reasoning quality has been eroding since ~10–20%. By the time the safety net triggers, you've spent most of the session in the dumb zone. That whole stretch is where quality silently erodes:

↪ Your win: compact before the zone, on purpose

Compact manually at these transitions: before reasoning-heavy work, after big file reads you've extracted from, at task-type switches, and the moment the agent repeats itself.

Direct what survives: /compact Focus on the failing assertions in X and the Y method; drop CI logs.

For a reasoning-heavy session, move the trigger earlier: CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=55 claude (50–60% for architecture/debugging; leave 95% for pure retrieval).

Retrieval practice — recall, don't peek

Question 1Degradation onset is best modeled as…

Question 2Reasoning tasks effectively use roughly what share of the window?

Question 3Auto-compaction at 95% helps too little because by then…

I'm your teacher — ask me anything. Stuck on why an absolute threshold (not a %) makes sense given how attention scales? Want to set a per-project compaction policy in CLAUDE.md, or see this on your own repo? Just say so and we'll go deeper or move to Lesson 02.

The Dumb Zone

It's not uniform — task type sets the threshold

The gap that bites you

↪ Your win: compact before the zone, on purpose

Go deeper