Tool Engineering · ~7 min
The agent never browses your tool catalog. It selects by reasoning over descriptions — so the description is where you win or lose the call, before any work begins.
Selection is a reasoning step, not a lookup. The agent reads the descriptions in its context and picks the tool whose description best matches its current intent. A tool whose description fails to communicate a use case is invisible for that use case — even when the implementation would have handled it.
other_tool when Y." Those are instructions to the agent, not documentation of the interface.A terse reference is enough for a developer who already knows the system. The agent knows nothing — it cannot
infer what "user" means in your domain, whether a date is ISO 8601 or a Unix timestamp, or that you must call
list_sprints before you have a sprint_id. Write as if onboarding a competent new hire on
day one: explicit about the things docs omit because experienced users already know them.
Three things a new hire learns day one — IDs are numeric strings, call list_sprints first, status
takes those exact strings — are now on the surface. The terse version forces a guess on each.
Description tells the agent how to call correctly. The schema can make the wrong call impossible. This is poka-yoke — mistake-proofing borrowed from Toyota: redesign so the defect can't occur.
| Mechanism | Schema move | Example |
|---|---|---|
| Contact — shape blocks misuse | Enumerate valid values | ["python","typescript","all"], not free text |
| Fixed-value — bound the range | Clamp with a default | max_results 1–100, default 20 |
| Motion-step — enforce order | Prerequisite gate | Edit rejects a file not yet read |
Unambiguous names do the same work: user_id not user, start_date not
date — the name carries the type and format so the agent can't substitute the wrong thing. And keep
formats close to training data (JSON, markdown, prose); inputs that need line counting or string-escaping raise
error rates. Adding concrete sample calls to a tool definition moved accuracy from 72% to 90% on
complex parameter handling in Anthropic's testing.
There's a level above the schema, too. A description that over-prescribes — "always call
list_sprints first" — strips the agent of valid paths when the ID is already known. Describe what a
parameter requires, not how to obtain it from scratch every time. State the contract and the
constraints; leave the sequencing to the agent's reasoning, which has information the schema author didn't.
Too low (prescriptive sequence baked into the description): brittle — breaks when the task varies. Too high (vague "search for things"): the agent burns tokens resolving the ambiguity before it can call. The right altitude is the minimum detail that prevents misuse — no more, because every extra token is paid on every invocation.
Retrieval practice — recall, don't peek
Question 1The most common tool-description failure is one that states what the tool does but not…
Question 2Replacing a free-text parameter with an enum is an example of…
Question 3The "onboarding" framing says to write descriptions as if the reader is…
Question 4Baking "always call list_sprints first" into a description risks making the tool…
Question 5 · spaced recall from Lesson 01When an agent keeps misusing a tool, the first place to look is…