Tool Engineering · ~7 min
Sizing output was the budget. Shaping it is the craft: which fields, in what language, paginated how — and what to do when even the trimmed result overflows.
The agent reasons over your output as natural language. So output format is a reliability lever independent of model capability: the same tool, reshaped, produces fewer hallucinations and more accurate downstream calls — without touching the model.
UUIDs, MIME types, and internal codes are arbitrary byte sequences to a model. Resolving them to natural-language fields measurably reduces hallucination in retrieval tasks — the agent can reference the object by name instead of miscopying a 36-character UUID.
A get_customer that returns twelve fields when the agent needs three forces it to parse nine
irrelevant ones — and every extra field is a hallucination opportunity: the agent may reference,
invent, or misattribute stripe_id or feature_flags in an email it had no business
mentioning. Design the schema around decision-relevant fields. Keep developer-debugging output behind a separate
debug mode — don't make the firehose the default an agent sees.
A tool that returns the full dataset pushes filtering onto the agent, which then either hallucinates the filter or loads everything into context. Move that work into the tool.
| Move | What it does |
|---|---|
Filter params (status=open) | Apply before returning; agent doesn't filter in its head |
| Pagination with a cursor | Return one page + a handle, not an unbounded list |
Sensible defaults (limit=20) | Prevent accidental context flooding |
response_format enum | Let the agent pick concise vs detailed per its budget |
Even filtered output can blow the token budget. The wrong answer is a hard error — the agent spends a turn, gains nothing, and has to guess retry parameters. The right answer is a graceful truncation contract with three load-bearing parts. Drop any one and it collapses back into the failure it was meant to fix.
A bottom-of-output [PARTIAL] line is the easiest implementation and the most
ignored: the model reads the prefix as the whole document and the trailing line as commentary.
Anthropic's own Read tool hit this as a bug — the agent treated a preview as the complete file and silently
dropped the un-read rules. Put the marker in a structurally distinct slot: a leading banner, a separate
JSON field, or typed metadata.
An error is output too. "400 Bad Request" tells the agent nothing; "Invalid date range:
end_date must be after start_date. Received start=2024-03-01, end=2024-02-01." tells it exactly what to fix.
Actionable errors let the agent self-correct on the next call without a human — the topic Lesson 5 takes apart in
full.
A task-specific schema that omits a field the agent unexpectedly needs forces a hallucination or an extra round-trip — sometimes pricier than returning the full record once. And a graceful tool paired with an agent that has no reaction policy for the marker is worse than a hard error: the system prompt must say what to do (paginate, summarise, ask, proceed). For security-critical reads — policy files, guardrails — fail-closed beats return-prefix: a silently truncated rules file is the same as a missing one.
debug or detailed mode.Retrieval practice — recall, don't peek
Question 1Resolving a UUID to a natural-language name mainly reduces…
Question 2Filtering and pagination belong…
Question 3A trailing [PARTIAL] line tends to be ignored because the model…
Question 4For a guardrail or policy-file read, truncation should…
Question 5 · spaced recall from Lesson 03The right design question for tool output is…
response_format enums pair with a "reason about your data needs first"
prompt? Next in Part 2: Errors as a Teaching Signal — turning failures into the agent's next correct move.