Part 4 · Integrity & Operations

Context Engineering · ~6 min

What's Eating the Window

"Context is 80% full" tells you that you have a problem. It doesn't tell you which tool call caused it — so you prune blindly and often cut the wrong thing.

Why this, for you: Lesson 18 said measure before you optimize. This is the instrument. Before you compress, mask, or compact, you want per-tool attribution — which call actually inflated the window — so you fix the culprit instead of guessing. It's the difference between a profiler and a wild prune.

Agents accumulate context silently. A large file read, a verbose grep, an accumulated error trace — each adds thousands of tokens without any single call looking expensive. Aggregate metrics (percent full) tell you that you have a problem; they never tell you which tool caused it.

1 The diagnostic surface

Claude Code's /context command (shipped v2.1.74, 2026-03-12) is the first developer-facing example to ship in a major harness. It attributes token consumption to specific tool calls, memory files, and outputs.

It exposes tool-level attribution (which calls cost the most tokens), memory-bloat flags (files that grew unbounded), capacity warnings (quantified headroom to the limit), and actionable tips per finding. That moves context management from reactive — compress when full — to diagnostic: find and fix the culprit before compression is even necessary.

2 The usual suspects

Per-tool attribution almost always surfaces the same short list of offenders — and each has a targeted remedy:

CulpritWhy it's expensiveRemediation
Large file readsEntire file enters context regardless of relevanceTruncate to the relevant section; load semantically
Verbose tool outputsGrep / build / test output without filteringAdd --max-count; filter before surfacing
Accumulated error tracesRepeated full stack traces compound fastKeep the first occurrence, drop duplicates
Memory filesCLAUDE.md / scratch files grow unbounded across sessionsPeriodically compact or reset entries

3 Diagnose, then compress — not the other way

The ordering is the whole point. Run the diagnostic before reaching for compression.

Compression without attribution risks discarding high-value content while leaving the actual inflator in place. Identify the offender, apply the targeted fix (truncate / filter / offload), then re-run to verify the reduction. Measure-then-act, not compress-and-hope — the same principle as per-query profiling in a database.

When attribution comes up empty

Per-tool attribution only helps when the expensive tool is also avoidable. It reports modest tool costs while the window is still full when the inflation lives outside tool calls — long conversation history, a large system prompt, accumulated reasoning. Those are the targets for manual compaction, not truncation. And a mandatory full-repo scan or a required large API payload will show up as the culprit with no remediation to offer.

4 Generalising past Claude Code

No other major harness currently documents an equivalent developer-facing diagnostic. LangChain's Deep Agents auto-summarises but exposes no per-tool breakdown; OPENDEV's adaptive compaction keeps attribution internal to the compactor. The pattern still generalises: for any harness without one, instrument at the tool-call boundary — log token counts before and after each invocation, then aggregate by tool type.

↪ Your win: profile the window before you prune it

Retrieval practice — recall, don't peek

Question 1The gap that per-tool attribution closes is that aggregate metrics tell you…

Question 2The recommended ordering is to…

Question 3Per-tool attribution comes up empty when inflation lives in…

Question 4For a harness with no built-in diagnostic, you should…

Question 5 · spaced recall from Lesson 19Indirect prompt injection works because the model…

Ask me anything. Want the boundary-instrumentation snippet that logs per-tool token deltas in a harness without /context, or a rule for when a culprit is "unavoidable" vs worth fixing? Next: Prime the Pump — the deliberate preload that's the counterpart to just-in-time retrieval.
✎ Feedback