Observability · ~6 min
An agent that only reads code and test output is flying blind. Wire in signals it can see, and it stops guessing whether the fix worked.
An agent writes code, runs tests, reads the output. That is the entire loop for most setups. Three things it cannot do without help: confirm a UI renders, query whether a fix actually dropped the error rate, or search logs for the pattern a user reported. Missing those, it operates in "write and hope" mode.
Wiring observability into an agent's context turns "write and hope" into "write, observe, verify." The signals fall into three buckets, each closing a different blind spot.
| Signal | What it answers |
|---|---|
| Visual | Browser automation — did the UI render and behave? |
| Log | Structured, filterable log entries — what error pattern fired, and how often? |
| Metric | Counters and latencies — did error rate or p99 actually move after the change? |
Not every check costs the same. Start with the cheapest signal that can answer the question and only climb when it can't.
An agent fixing a login bug climbs it in order: logs name the failing call, tests confirm the code fix, a browser snapshot shows the dashboard heading appeared, and a metric query proves errors fell from 312 to 3. The loop closed because each layer was legible.
Observability data is high-volume — a careless log query returns thousands of entries and shreds the context window. The discipline is just-in-time references: store a query string and time range, load the payload only when you need to verify.
If the log or metric MCP server is down, the agent loses all visibility — and may silently proceed, reading an empty result as "no errors." Stale or sampled metrics mislead the same way: querying error rate 30 seconds after a deploy can read pre-deploy data and wrongly conclude the fix worked.
python -c → curl → browser → vision.Retrieval practice — recall, don't peek
Question 1An agent that only reads code and test output operates in…
Question 2For a functional UI check, the cheaper, model-readable signal is…
Question 3The verification ladder says you should start with…
Question 4A JIT reference means the agent stores the…
Question 5If the log MCP server goes down mid-task, the danger is the agent…