Observability · ~7 min
"82% full" names a symptom, not a cause. Cut the window into the sources you can actually act on — rules, skills, MCP, subagents, history — and you prune the right one instead of compacting on reflex.
A single "78% of the context window" indicator tells you the window is filling and nothing about which source filled it. Per-source attribution breaks that one number into the configuration sources consuming the budget — rules, skills, MCP returns, subagent transcripts, conversation — because each maps to a different thing you can prune.
There are two ways to slice the same token counts, and they answer different questions.
| Cut | Answers |
|---|---|
| Per-tool | Which tool call dumped the most tokens (Claude Code's /context) |
| Per-source | Which configuration source is consuming budget, regardless of call |
Cursor shipped per-source attribution on 2026-05-06 — "you can now see a breakdown of your agent's context usage" — with categories rules, skills, MCPs, subagents. The categories matter because each maps to one remediation primitive: unload a skill, disable an MCP server, prune a rule file, kill a subagent.
Seven sources, seven distinct remediation surfaces. The point of separating them is that each one is fixed differently.
| Source | Remediation |
|---|---|
| Rules / instruction files | Prune CLAUDE.md / AGENTS.md against the rule budget |
| Skill definitions | Mark low-value skills name-only or off |
| MCP tool returns | Drop a server, narrow tool selection, audit output cost |
| Subagent transcripts | Tighten the output schema; summarise instead of forward |
| Tool outputs (non-MCP) | Truncate at the call site; mask observations |
| Conversation history | Compact, or split into a fresh session |
| Cache prefix | Stable across turns — flag only when the prefix bloats |
This isn't only a UI panel. Claude Code's OTel exporter already carries the attributes that make per-source
attribution computable from telemetry — the UI surface and the export consume the same counts. The
claude_code.token.usage metric carries type (input / output /
cacheRead / cacheCreation) and query_source (main /
subagent / auxiliary).
Cursor's panel is the always-on surface; an OTel collector is the post-hoc audit path. Useful thresholds turn a chart into a signal: MCP returns > 30% and rising means an unbounded server; skills > 20% on a session that never invoked one means descriptions are too verbose; subagent transcripts > 15% means handoff schemas are missing.
A breakdown is only trustworthy when it sums input, output, and cache tokens. Counting input alone undercounts the budget — Claude Code once showed ~20% full while the session was actually at its limit. And the cut itself can be the wrong axis: when a long agentic run is dominated by file reads and grep, everything buckets into one giant "tools" slice that points at no specific call. Switch to per-tool attribution then.
query_source and type — the same counts as the panel, audited post-hoc.Retrieval practice — recall, don't peek
Question 1Per-source attribution exists so that an operator can…
Question 2Collapsing two categories into "static prompt: 36%" is bad because it…
Question 3To get the subagent-vs-main split from OTel, group the token metric by…
Question 4A breakdown that shows ~20% while the session is at its limit most likely…
Question 5 · spaced recall from Lesson 06The most reliable outcome grader for a coding agent is…
name-only, or how the per-source and per-tool cuts complement each other on one bloated session? Next in
Part 4: Catching the Wasted Run — six signals that diagnose a multi-agent failure mid-trajectory.