Context Engineering · ~6 min
"Context is 80% full" tells you that you have a problem. It doesn't tell you which tool call caused it — so you prune blindly and often cut the wrong thing.
Agents accumulate context silently. A large file read, a verbose grep, an accumulated error trace — each adds thousands of tokens without any single call looking expensive. Aggregate metrics (percent full) tell you that you have a problem; they never tell you which tool caused it.
Claude Code's /context command (shipped v2.1.74, 2026-03-12) is the first developer-facing example to
ship in a major harness. It attributes token consumption to specific tool calls, memory files, and outputs.
Per-tool attribution almost always surfaces the same short list of offenders — and each has a targeted remedy:
| Culprit | Why it's expensive | Remediation |
|---|---|---|
| Large file reads | Entire file enters context regardless of relevance | Truncate to the relevant section; load semantically |
| Verbose tool outputs | Grep / build / test output without filtering | Add --max-count; filter before surfacing |
| Accumulated error traces | Repeated full stack traces compound fast | Keep the first occurrence, drop duplicates |
| Memory files | CLAUDE.md / scratch files grow unbounded across sessions | Periodically compact or reset entries |
The ordering is the whole point. Run the diagnostic before reaching for compression.
Per-tool attribution only helps when the expensive tool is also avoidable. It reports modest tool costs while the window is still full when the inflation lives outside tool calls — long conversation history, a large system prompt, accumulated reasoning. Those are the targets for manual compaction, not truncation. And a mandatory full-repo scan or a required large API payload will show up as the culprit with no remediation to offer.
No other major harness currently documents an equivalent developer-facing diagnostic. LangChain's Deep Agents auto-summarises but exposes no per-tool breakdown; OPENDEV's adaptive compaction keeps attribution internal to the compactor. The pattern still generalises: for any harness without one, instrument at the tool-call boundary — log token counts before and after each invocation, then aggregate by tool type.
/context first — get per-tool attribution before any compression move.Retrieval practice — recall, don't peek
Question 1The gap that per-tool attribution closes is that aggregate metrics tell you…
Question 2The recommended ordering is to…
Question 3Per-tool attribution comes up empty when inflation lives in…
Question 4For a harness with no built-in diagnostic, you should…
Question 5 · spaced recall from Lesson 19Indirect prompt injection works because the model…
/context, or a rule for when a culprit is "unavoidable" vs worth fixing? Next:
Prime the Pump — the deliberate preload that's the counterpart to just-in-time retrieval.