Observability · ~7 min
A long-running agent forgets across sessions. A progress file, git commits, and OTel traces give every fresh context window a record of what already happened.
Long-running agents spread decisions across sessions. Without a persistent record, each new session loses the trajectory — what was tried, what failed, what came next — and rebuilds it from scratch, burning tokens and producing inconsistent outcomes. The fix is to leave a trail.
Anthropic's harness-engineering guidance describes a trajectory log built from four files an agent maintains itself — no observability backend required.
| Component | Role in the trail |
|---|---|
claude-progress.txt | Read at session start, written at end — completed, next, blockers |
| Git commits | Diff-linked checkpoint after each task, queryable via git log |
feature-state.json | Machine-readable passes/fails snapshot, independent of LLM memory |
init.sh | Reconstructs the environment; run at startup to confirm known-good state |
feature-state.json to passes: true only after
verification. The snapshot survives context resets as an independent record — its whole job is to
prevent premature completion when the model's memory says "done" but nothing confirmed it.Commit after each completed task with a descriptive message and the history becomes a chronological, diff-linked record of every decision — readable by humans, queryable by future sessions. When context is compressed, write the full messages to the filesystem alongside the structured summary so the trajectory is offloaded, not discarded.
After summarisation, an agent that asks for clarification it already had, or declares premature completion, is signalling goal drift — the trajectory fell out of context. The progress file read at the next session start is what restores it.
Filesystem trails are human-readable. For cost attribution and compliance, Claude Code ships native OpenTelemetry — one env var plus an exporter, no code changes.
A single prompt triggers dozens of API calls; the shared prompt.id is what makes tracing a cost spike
feasible after the fact. claude_code.cost.usage by user.account_uuid and model
gives per-user, per-model attribution — but the values are approximations; reconcile chargebacks
against the billing console, not the metric.
The four-file pattern assumes a persistent local working directory. It breaks for serverless or ephemeral agents (filesystem gone on teardown) and for parallel agent pools (concurrent writes to one progress file race). With existing OTel pipelines in place, duplicating the trail in flat files adds maintenance for no new insight.
feature-state.json only after verification — it blocks premature "done."prompt.id ties a prompt's events together.Retrieval practice — recall, don't peek
Question 1The progress file is read…
Question 2You flip a feature to passes: true…
Question 3In Claude Code OTel, prompt.id exists to…
Question 4The filesystem trail pattern breaks down for…
Question 5 · spaced recall from Lesson 01The cheapest-first ordering of verification signals is…
init.sh health-check snippet, or how cost
attribution by team works via OTEL_RESOURCE_ATTRIBUTES? Next in Part 2: The Log Is the Truth —
event sourcing makes the trail the source of state, not a side effect.