Compaction

Lesson 7 told a long-running agent to resume from a durable log instead of carrying everything in context. This is the move that keeps the in-session context worth carrying — replacing accumulated token mass with a dense summary before quality silently erodes.

Why this, for you: the course has leaned on "full context resets" — /clear between unrelated tasks — without teaching the in-between move. Compaction is that move: you keep the thread alive but shrink it, trading a clean wipe for a distilled summary. Get the timing wrong and you either reason in a degraded window or amputate context you still needed. This lesson is the timing.

Compaction replaces the accumulated conversation history with a dense summary, freeing the context window while preserving task intent and critical state. It sits between two extremes you already know: a full /clear (throw everything away) and carrying the whole history (let the window saturate).

1 The gap auto-compaction leaves open

Claude Code's auto-compaction fires at roughly 95% of the context window — and that is far too late. The window has a dumb zone: as context fills, pairwise token relationships stretch thin and reasoning degrades. Anthropic calls it "context rot," a gradient rather than a cliff, appearing "across all models."

The numbers are stark. BABILong found popular LLMs effectively utilize only 10–20% of a long context window for multi-step reasoning. LongCodeBench measured Claude 3.5 Sonnet's code bug-fixing collapse from 29% accuracy at 32K tokens to 3% at 256K. By the time the 95% trigger fires, the agent has been reasoning in the dumb zone for most of the session — the quality is already gone.

That gap, between degradation onset and the auto-trigger, is where output silently erodes. Manual compaction closes it by reframing the operation: not memory cleanup, but reasoning-quality preservation.

2 Compact at the seams, before the work that needs the window

The signal isn't a token count — it's a transition. Compact at the joints between phases, and right before you ask for the hardest reasoning of the session.

Compact when…	Don't compact when…
Before reasoning-intensive work (architecture, multi-step debugging)	The agent is mid-chain and needs the accumulated context to finish
After a bulk read whose details you've extracted	Reference material (schemas, specs) will be re-needed repeatedly
At task-type transitions (done searching, now planning)	You're iterating one file where the full edit history informs the next change

Direct what survives, rather than trusting a blind summary. A focus directive — /compact Focus on the API changes and the test failures — or a persistent CLAUDE.md block that always preserves the task objective, modified file paths, and unresolved errors. Compaction is lossy; the directive is how you steer the loss away from what matters.

# debugging session, context at ~55% — compact before reasoning /compact Focus on the three failing assertions in test_payment_flow.py and PaymentService.process(). Discard CI logs and unrelated files. # context drops ~55% -> ~15%; with a clean window the agent # then spots the race condition it had been overlooking

3 Make it earlier, make it graded, make it non-lossy

Three levers turn compaction from a panic button into a discipline:

Lower the trigger. CLAUDE_AUTOCOMPACT_PCT_OVERRIDE takes a value 1–100. Set it to 50–60% for reasoning-heavy sessions, 70–80% for mixed, and leave the 95% default only for retrieval-heavy work that tolerates a fuller window.

Graduate the stages. A single binary compaction is a cliff. OPENDEV's five-stage Adaptive Context Compaction degrades incrementally — warn at 70%, mask older observations at 80%, prune at 85%, aggressive-mask at 90%, full LLM summary at 99% — so the agent never hits one moment where the whole history collapses at once.

Offload, don't just summarize. The strongest pattern pairs summarization with offloading: large tool payloads go to disk, replaced by a reference plus a brief summary, recoverable on demand. That makes the operation selective discarding, not lossy encoding — artifacts stay on disk.

It's measurable. One study found that pruning context to the last five tool call/response pairs plus summarization reached 91.6% task completion versus 71% for full-context agents — at a fraction of the tokens and runtime. Carrying full history is not the safe default; it's the slow, degraded one.

When compacting costs more than it saves

Compaction has its own failure mode: aggressive summarization drops subtle constraints whose importance only surfaces later — Anthropic warns that "overly aggressive compaction can result in the loss of subtle but critical context." Each cycle adds summarization error; long sessions accumulate drift a single summary can't undo. And a too-low threshold forces lossy summarization while the window is still navigable, risking objective drift if the scope constraint gets omitted. Start from maximum recall and tune toward precision — not the reverse. If offloaded payloads get deleted, recoverability breaks and the whole approach is worse than just holding the context.

↪ Your win: shrink the thread before it goes dumb

Compact at the seams — phase transitions and just before the hardest reasoning, not at 95%.
Direct the loss — a focus directive or CLAUDE.md block preserves objective, paths, and open errors.
Lower the trigger for reasoning work — CLAUDE_AUTOCOMPACT_PCT_OVERRIDE to 50–60%.
Graduate the stages — incremental degradation beats one compression cliff.
Offload before you summarize — payloads to disk keep compaction recoverable, not just smaller.

Retrieval practice — recall, don't peek

Question 1Auto-compaction at ~95% fires too late because the agent has already…

Question 2BABILong found LLMs effectively use only what fraction of a long window for reasoning?

Question 3The best moment to compact manually is generally…

Question 4Offloading a large tool payload to disk makes compaction…

Question 5 · spaced recall from Lesson 7A long-running agent survives session boundaries by moving state into…

Ask me anything. Want a CLAUDE.md compact block tuned to your project, or a threshold table mapped to your session types? Next in Part 8: Cost Controls & Circuit Breakers — the third kind of limit, where the boundary is the budget and the failure mode is a loop that burns tokens without progress.

✎ Feedback