Observability · ~7 min
Six signals per agent are useless if you can't ask "show me everything this subagent did." A stable agent_id on every header and span makes the trace queryable by identity — no tree walk required.
Across 200 spans and 12 subagents, span hierarchy can answer "which spans belong to a subagent" via tree traversal — but only inside a single session, and only while lineage holds. Once work crosses a boundary the instrumentation doesn't cover — a shell-out, a webhook, a queue — lineage is gone. The fix is a flat, propagated identifier that survives where the trace context ends.
Claude Code 2.1.139 instruments subagents on two surfaces at once, carrying the same identity pair on each.
| Surface | Mechanism · value |
|---|---|
| Outgoing HTTP | Headers x-claude-code-agent-id / -parent-agent-id |
| OTEL spans | claude_code.llm_request attrs agent_id / parent_agent_id |
agent_id identifies the subagent that issued a request (absent on the main session);
parent_agent_id identifies the agent that spawned it. The pair is load-bearing: agent_id
supports "all work by this subagent"; parent_agent_id reconstructs the dispatch hierarchy from a
flat query — "which subagent spawned this 429-emitting child" — without walking the span tree.
With agent_id on every span and API event, the trace store becomes a per-agent analytics surface.
Without the attribute, the same queries require walking parent links per-trace — feasible at one trace, expensive at scale.
The crucial discipline: agent_id is a per-instance identifier, so it's high cardinality.
Use it as a span/event attribute, never a metric label — per-instance IDs create unbounded time series. Its
metric-safe partner is agent.name, the subagent type, a bounded set.
| Use this | For |
|---|---|
agent.name | Dashboard aggregation — bounded cardinality, safe as a metric label |
agent_id | Drilling into one specific incident in the trace store |
It's the same discipline Lesson 02 set for prompt.id: a per-instance correlation key belongs on
spans, not on metrics.
The contract holds only over surfaces Claude Code controls. Shell-out via Bash — a
curl — produces a tool span but carries no x-claude-code-agent-id header; the service sees
an anonymous request. Subprocess work inherits TRACEPARENT but not the agent identity.
Fire-and-forget queues discard the header on enqueue. Close the gap by lifting the call into an
instrumented MCP tool, or wrapping the shell-out with an explicit header. And mind privacy: agent_id is
opaque, but built-in agent.name values are verbatim — custom names redact to "custom", yet
a skill or plugin name on the same span can still leak intent.
agent_id + parent_agent_id together — flat identity plus a back-pointer rebuilds the hierarchy without a tree walk.agent.name.curl, subprocess, and queues drop the header; lift them into instrumented tools.Retrieval practice — recall, don't peek
Question 1Span hierarchy alone fails to correlate a subagent's work once…
Question 2The two propagation surfaces in the contract are…
Question 3To trace the dispatch chain up to a failing call, you follow…
Question 4For dashboard aggregation you slice by agent.name, not agent_id, because…
Question 5 · spaced recall from Lesson 08Failure-aware observability is best described as…
fanout-3
spawned by orch, or how to wrap a Bash curl so it carries the agent header? Next, the
Capstone: a decision table that ties single-agent and multi-agent observability into one chooser.