Part 1 · Foundations

Tool Engineering · ~6 min

What Makes a Tool Agent-Friendly

A tool that works perfectly when you call it can still be unusable by an agent. The difference is not the implementation — it's the surface.

Why this, for you: the single highest-leverage fix when an agent keeps misusing your tools. Before you touch the prompt, look at the tool. Agent quality is bounded by tool quality — and the bound is set by how the tool reads, not what it does. This is the lens the rest of the course sharpens.

Here's the trap. You wrap an existing function, the unit test passes, and the agent still calls it wrong — wrong parameters, wrong tool, output it can't parse. The instinct is to patch the prompt. That's treating a tool defect as a prompt problem.

Agent quality is bounded by tool quality. No prompt compensates for a tool interface the model cannot use reliably. Wrong selection, wrong parameters, misread output — these recur as a failure mode in agent loops, and the prompt is the wrong place to fix them.

1 The agent reads your surface literally

An agent's only contact with your tool is the literal bytes of the surface — the name, the schema, the description, the output, the error. It has no intuition to repair ambiguity. A human developer fills gaps from experience; the agent fills them with a guess, and guesses wrong at a rate proportional to the ambiguity. This is the discipline Netlify named Agent Experience (AX): designing the surface so an agent consumer can discover it, invoke it, and recover from it.

Same investment as prompt engineering

Anthropic's guidance is explicit: tool design deserves the same investment as prompt engineering. On SWE-bench they report spending more time optimizing tools than the overall prompt — because that's where the reliability lived.

2 Five surfaces, five questions

"Agent-friendly" decomposes into five concrete surfaces. Each is a place the agent can succeed or stall, and each is a later lesson in this course.

SurfaceThe question the agent is asking
Name & descriptionIs this the right tool, and when do I prefer it over the others?
SchemaWhat exactly do I pass, and in what format?
OutputWhat in this response do I act on next?
ErrorsWhat went wrong, and what do I try instead?
The setAmong all my tools, which combination achieves the goal?

A tool is agent-friendly when all five questions have an answer the agent can read off the surface without guessing. Miss one and you get a failure mode that looks like a reasoning problem but isn't.

3 Don't write a boilerplate wrapper

The default move — wrap an API endpoint one-to-one, return its raw response — produces a tool shaped for the API, not the agent. Design agent tools like APIs you ship to a new hire: documentation, a concrete example, edge-case handling, and mistake-proofing. The contrast below is the whole course in miniature.

# Boilerplate wrapper — vague params, raw passthrough def search(query, type, limit): """Search the codebase.""" return _run_search(query, type, limit) # returns thousands of raw items # Engineered tool — enumerated type, bounded limit, summarised, structured error def search_codebase( search_query: str, file_type: Literal["python", "typescript", "all"], # can't pass a typo max_results: int = 20, # bounded 1-100, sensible default ) -> dict: # returns match_count + a page, not the firehose

The agent using the second version knows the valid file_type values without trial and error, can't flood its own context, and gets a result it can act on. None of that is the implementation — it's the surface.

When this is overhead, not investment

The full pattern pays off on stable, shared tools called across many sessions. For a one-off exploratory script, or a tool wrapping a well-documented API the model already has strong priors for, a thin wrapper can outperform a heavy custom docstring — heavyweight docs on an unstable interface drift from reality and mislead more than a terse stub. Engineer the surface where the tool is durable.

↪ Your win: fix the tool before you fix the prompt

Retrieval practice — recall, don't peek

Question 1When an agent keeps misusing a tool, the first thing to audit is…

Question 2An agent reading your tool surface has…

Question 3"Agent quality is bounded by tool quality" means…

Question 4Heavy tool documentation is the wrong call when the tool is…

Question 5The surface that answers "which tool, and when do I prefer it?" is the…

Ask me anything. Want to run the five-surface audit against one of your own tools, or see why "fix the interface, not the prompt" has a terminating condition the prompt-patch loop doesn't? Next in Part 1: Schema & Description Altitude — the surface that decides whether the agent even picks your tool.
✎ Feedback