Part 2 · Taming the Tail

Context Engineering · ~7 min

Offload vs Summarise

Two different moves get called "compaction." One is recoverable; one is lossy. Knowing which to reach for is the skill.

Why this, for you: Lesson 09's masking was one tier of a bigger system. This is the full toolkit for keeping a long agent session alive and sharp — the daily-coding payoff (#1) of never losing a multi-hour task to the dumb zone, and a core harness-design pattern (#2).

Long-horizon tasks accumulate context until the agent truncates or fails. Compression buys room — but "compress" hides two fundamentally different operations.

Tier 1 · Offload

Move payloads to disk

Replace a big tool result (full file, API response) with a reference + summary. Content lives on disk; the agent re-reads on demand. Recoverable, non-lossy.

Tier 2 · Summarise

Distil the history

Replace prior turns with a summary of objective, state, decisions, next steps. Lossy — once summarised, the detail is gone. Use only when offloading isn't enough.

The governing principle: selective discarding, not lossy encoding. Offload anything addressable on demand (it stays on disk); summarise only the conversation history, and only the parts that no longer change the outcome.

Apply in tiers, as pressure rises

  1. Offload large tool responses to the filesystem (reference + brief summary).
  2. Mask older single-use tool outputs (Lesson 09) — pointers replace raw output.
  3. Summarise conversation history when context still fills — objective, state, constraints, next steps.

Graduated stages let the agent degrade incrementally rather than hitting a single compression cliff where the whole history collapses at once. Keep the most recent tool outputs at full fidelity.

91.6% vs 71%
Pruning to the last ~5 tool-call/response pairs plus summarisation reached 91.6% task completion versus 71% for full-context agents — at a fraction of the tokens (arXiv:2606.10209). Less context, better outcomes.

What a summary must preserve

A summary of "what happened" without "what matters next" causes objective drift. Structure it:

ObjectiveThe original task and any scope changes
StateWhat's been built, changed, or decided
ConstraintsAny constraint surfaced during the session
Next stepsThe immediate next action

Two edges

① Start at max recall, iterate toward precision

Aggressive summarisation drops subtle constraints whose importance only emerges later. Anthropic's guidance: begin by preserving too much, then tighten — not the reverse. And offloaded payloads must persist for the whole session; delete the store and recoverability is gone (worse than never offloading).

② Compaction is cache-friendly if you let it be (Lesson 07)

Compaction reuses the parent session's cached prefix — a cache_control breakpoint at the end of the system prompt keeps that cache valid across the cycle; only the new summary is written fresh. Fork, don't rebuild.

↪ Your win: reach for the cheapest tier that buys the room

Retrieval practice — recall, don't peek

Question 1Offloading a large payload to disk is…

Question 2Tiered compression applies in what order?

Question 3Beyond objective, state, and constraints, a summary must keep…

Question 4Pruning to the last few tool calls plus summarising tends to…

Question 5 · spaced recall from Lesson 09Most tool outputs in a session are…

Ask me anything. Want the five-stage graduated schedule (warn 70% → mask 80% → prune 85% → aggressive 90% → full compaction 99%)? Or how an "artifact index" makes compaction effectively non-lossy? Next in Part 2: Just-in-Time Retrieval — pulling context in when needed instead of carrying it.
✎ Feedback