Observability · ~7 min
Stop letting agents write files directly. Have them emit intentions to an append-only log; a deterministic orchestrator applies the effects — and you can replay the whole thing.
Long-horizon agents fail two ways: context degradation (early decisions fall out of the window, so the agent repeats or contradicts itself) and non-deterministic mutation (the agent writes files directly, so you can't audit, replay, or verify that intention matched effect). ESAA — Event Sourcing for Autonomous Agents — separates the two.
The agent — the cognitive layer — produces a structured JSON intention. It never writes a file. A deterministic orchestrator validates the schema, appends the event to an append-only log, then applies the effect.
Because the log is append-only and every effect is deterministic, you can replay activity.jsonl and
rebuild project state from nothing. If the derived state matches the filesystem, the execution record is
verified.
This gives three properties at once: forensic traceability (every change explained by a logged event with timestamp and agent identity), immutability (no in-place updates, so history can't be silently revised), and reproducibility (same event sequence → same final state). The log is the source of truth; current state is a derived projection.
In the paper's clinical-dashboard case study, 4 concurrent heterogeneous LLMs worked different
tasks from one roadmap. The orchestrator serializes event persistence and state mutation, so they never collide — and
they need no coordination logic of their own. They stay cognitively independent: emit an intention, receive an updated
roadmap.json view.
roadmap.json is continuously rebuilt from the log to show current task status. Agents get this
compact view as context instead of growing conversation history — directly attacking context
degradation. Tasks move todo → in_progress → review → done; done is terminal. Defects get a
new hotfix task, never a rewrite of history.
ESAA needs an orchestrator process, schema validation, log storage, and a materialized view. For single-session, single-agent tasks, direct mutation is faster — the infrastructure adds latency for no payoff. Early on, a churning schema means constant contract updates and log migrations. And the central orchestrator can become a throughput bottleneck at high concurrency.
Retrieval practice — recall, don't peek
Question 1Under ESAA, the agent's job is to…
Question 2"Fail-closed" validation means a malformed event…
Question 3Replay verification confirms execution by…
Question 4The roadmap.json materialized view mainly counters…
Question 5 · spaced recall from Lesson 02You flip feature-state.json to passes: true…
AGENT_CONTRACT.yaml boundary-contract shape, or how the
todo → in_progress → review → done state machine handles a reviewer's request_changes? Next
in Part 2: The Four Failure Modes — debugging bad output by layer.