Tool Engineering · ~6 min
A tool that works perfectly when you call it can still be unusable by an agent. The difference is not the implementation — it's the surface.
Here's the trap. You wrap an existing function, the unit test passes, and the agent still calls it wrong — wrong parameters, wrong tool, output it can't parse. The instinct is to patch the prompt. That's treating a tool defect as a prompt problem.
An agent's only contact with your tool is the literal bytes of the surface — the name, the schema, the description, the output, the error. It has no intuition to repair ambiguity. A human developer fills gaps from experience; the agent fills them with a guess, and guesses wrong at a rate proportional to the ambiguity. This is the discipline Netlify named Agent Experience (AX): designing the surface so an agent consumer can discover it, invoke it, and recover from it.
Anthropic's guidance is explicit: tool design deserves the same investment as prompt engineering. On SWE-bench they report spending more time optimizing tools than the overall prompt — because that's where the reliability lived.
"Agent-friendly" decomposes into five concrete surfaces. Each is a place the agent can succeed or stall, and each is a later lesson in this course.
| Surface | The question the agent is asking |
|---|---|
| Name & description | Is this the right tool, and when do I prefer it over the others? |
| Schema | What exactly do I pass, and in what format? |
| Output | What in this response do I act on next? |
| Errors | What went wrong, and what do I try instead? |
| The set | Among all my tools, which combination achieves the goal? |
A tool is agent-friendly when all five questions have an answer the agent can read off the surface without guessing. Miss one and you get a failure mode that looks like a reasoning problem but isn't.
The default move — wrap an API endpoint one-to-one, return its raw response — produces a tool shaped for the API, not the agent. Design agent tools like APIs you ship to a new hire: documentation, a concrete example, edge-case handling, and mistake-proofing. The contrast below is the whole course in miniature.
The agent using the second version knows the valid file_type values without trial and error, can't
flood its own context, and gets a result it can act on. None of that is the implementation — it's the surface.
The full pattern pays off on stable, shared tools called across many sessions. For a one-off exploratory script, or a tool wrapping a well-documented API the model already has strong priors for, a thin wrapper can outperform a heavy custom docstring — heavyweight docs on an unstable interface drift from reality and mislead more than a terse stub. Engineer the surface where the tool is durable.
Retrieval practice — recall, don't peek
Question 1When an agent keeps misusing a tool, the first thing to audit is…
Question 2An agent reading your tool surface has…
Question 3"Agent quality is bounded by tool quality" means…
Question 4Heavy tool documentation is the wrong call when the tool is…
Question 5The surface that answers "which tool, and when do I prefer it?" is the…