Two different moves get called "compaction." One is recoverable; one is lossy. Knowing which to reach for is the skill.
Why this, for you: Lesson 09's masking was one tier of a bigger system. This is the full toolkit
for keeping a long agent session alive and sharp — the daily-coding payoff (#1) of never losing a multi-hour task to
the dumb zone, and a core harness-design pattern (#2).
Long-horizon tasks accumulate context until the agent truncates or fails. Compression buys room —
but "compress" hides two fundamentally different operations.
Tier 1 · Offload
Move payloads to disk
Replace a big tool result (full file, API response) with a reference + summary. Content lives on disk; the agent re-reads on demand. Recoverable, non-lossy.
Tier 2 · Summarise
Distil the history
Replace prior turns with a summary of objective, state, decisions, next steps. Lossy — once summarised, the detail is gone. Use only when offloading isn't enough.
The governing principle: selective discarding, not lossy encoding. Offload anything addressable
on demand (it stays on disk); summarise only the conversation history, and only the parts that no longer change the
outcome.
Apply in tiers, as pressure rises
Offload large tool responses to the filesystem (reference + brief summary).
Summarise conversation history when context still fills — objective, state, constraints, next steps.
Graduated stages let the agent degrade incrementally rather than hitting a single compression cliff where the whole history collapses at once. Keep the most recent tool outputs at full fidelity.
91.6% vs 71%
Pruning to the last ~5 tool-call/response pairs plus summarisation reached 91.6%
task completion versus 71% for full-context agents — at a fraction of the tokens
(arXiv:2606.10209). Less context, better outcomes.
What a summary must preserve
A summary of "what happened" without "what matters next" causes objective drift. Structure it:
Objective
The original task and any scope changes
State
What's been built, changed, or decided
Constraints
Any constraint surfaced during the session
Next steps
The immediate next action
Two edges
① Start at max recall, iterate toward precision
Aggressive summarisation drops subtle constraints whose importance only emerges later. Anthropic's guidance:
begin by preserving too much, then tighten — not the reverse. And offloaded payloads must persist
for the whole session; delete the store and recoverability is gone (worse than never offloading).
② Compaction is cache-friendly if you let it be (Lesson 07)
Compaction reuses the parent session's cached prefix — a cache_control breakpoint at the end of the
system prompt keeps that cache valid across the cycle; only the new summary is written fresh. Fork, don't rebuild.
↪ Your win: reach for the cheapest tier that buys the room
Offload before you summarise — keep recoverable payloads on disk, not in context.
Mask single-use outputs (Lesson 09) before touching history.
Summarise last, with Objective / State / Constraints / Next steps — never just "what happened".
Keep recent outputs full-fidelity; degrade graduated, not cliff-edge.
Don't over-compress — start max-recall; the dumb zone, not capacity, is the real enemy.
Retrieval practice — recall, don't peek
Question 1Offloading a large payload to disk is…
Question 2Tiered compression applies in what order?
Question 3Beyond objective, state, and constraints, a summary must keep…
Question 4Pruning to the last few tool calls plus summarising tends to…
Question 5 · spaced recall from Lesson 09Most tool outputs in a session are…
Ask me anything. Want the five-stage graduated schedule (warn 70% → mask 80% → prune 85% → aggressive
90% → full compaction 99%)? Or how an "artifact index" makes compaction effectively non-lossy? Next in Part 2:
Just-in-Time Retrieval — pulling context in when needed instead of carrying it.