Part 4 · Many Agents, One Trace

Observability · ~7 min

Attributing the Context

"82% full" names a symptom, not a cause. Cut the window into the sources you can actually act on — rules, skills, MCP, subagents, history — and you prune the right one instead of compacting on reflex.

Why this, for you: the reflex when a window fills is to compact or restart — which throws away working state to fight the wrong source. This lesson gives you the breakdown that turns one scary percentage into a short list of remediations, each tied to a thing you can unload. It's the first lesson that treats the context window itself as a system to be observed.

A single "78% of the context window" indicator tells you the window is filling and nothing about which source filled it. Per-source attribution breaks that one number into the configuration sources consuming the budget — rules, skills, MCP returns, subagent transcripts, conversation — because each maps to a different thing you can prune.

1 Two cuts of the same telemetry

There are two ways to slice the same token counts, and they answer different questions.

CutAnswers
Per-toolWhich tool call dumped the most tokens (Claude Code's /context)
Per-sourceWhich configuration source is consuming budget, regardless of call

Cursor shipped per-source attribution on 2026-05-06 — "you can now see a breakdown of your agent's context usage" — with categories rules, skills, MCPs, subagents. The categories matter because each maps to one remediation primitive: unload a skill, disable an MCP server, prune a rule file, kill a subagent.

The categories must match what you can act on. A breakdown that collapses two — "static prompt: 36%" — leaves you unable to choose between unloading a skill and pruning a rule. Collapsing categories defeats the cut.

2 The categories, each with a remedy

Seven sources, seven distinct remediation surfaces. The point of separating them is that each one is fixed differently.

SourceRemediation
Rules / instruction filesPrune CLAUDE.md / AGENTS.md against the rule budget
Skill definitionsMark low-value skills name-only or off
MCP tool returnsDrop a server, narrow tool selection, audit output cost
Subagent transcriptsTighten the output schema; summarise instead of forward
Tool outputs (non-MCP)Truncate at the call site; mask observations
Conversation historyCompact, or split into a fresh session
Cache prefixStable across turns — flag only when the prefix bloats

3 The same cut, exported via OTel

This isn't only a UI panel. Claude Code's OTel exporter already carries the attributes that make per-source attribution computable from telemetry — the UI surface and the export consume the same counts. The claude_code.token.usage metric carries type (input / output / cacheRead / cacheCreation) and query_source (main / subagent / auxiliary).

# group the same metric two ways group by query_source # the subagent-vs-main split group by type # active-input vs cached-prefix tokens

Cursor's panel is the always-on surface; an OTel collector is the post-hoc audit path. Useful thresholds turn a chart into a signal: MCP returns > 30% and rising means an unbounded server; skills > 20% on a session that never invoked one means descriptions are too verbose; subagent transcripts > 15% means handoff schemas are missing.

Trust the denominator before the slice

A breakdown is only trustworthy when it sums input, output, and cache tokens. Counting input alone undercounts the budget — Claude Code once showed ~20% full while the session was actually at its limit. And the cut itself can be the wrong axis: when a long agentic run is dominated by file reads and grep, everything buckets into one giant "tools" slice that points at no specific call. Switch to per-tool attribution then.

↪ Your win: prune the source, not the symptom

Retrieval practice — recall, don't peek

Question 1Per-source attribution exists so that an operator can…

Question 2Collapsing two categories into "static prompt: 36%" is bad because it…

Question 3To get the subagent-vs-main split from OTel, group the token metric by…

Question 4A breakdown that shows ~20% while the session is at its limit most likely…

Question 5 · spaced recall from Lesson 06The most reliable outcome grader for a coding agent is…

Ask me anything. Want the action-threshold table for when to drop an MCP server versus mark a skill name-only, or how the per-source and per-tool cuts complement each other on one bloated session? Next in Part 4: Catching the Wasted Run — six signals that diagnose a multi-agent failure mid-trajectory.
✎ Feedback