Reference · Canonical Language

Tool Engineering — Glossary

The working vocabulary for the Tool Engineering course. Once a term lives here, every lesson uses this word for it. Grows as we go.

The Discipline

Tool engineering: Designing the tools agents use to act on the world — name, schema, description, output, errors, and the set as a whole — so an agent can select, invoke, and recover from them reliably. Agent quality is bounded by tool quality.; Avoid: "writing API wrappers" — a boilerplate wrapper is exactly what this discipline is not.; Source: tool-engineering
Tool surface: The literal bytes an agent reads to use a tool: name, schema, description, output, error. The agent reasons over the surface — not the implementation — so the surface is where reliability is won or lost.; Source: tool-engineering
Agent Experience · AX: The discipline of designing an external surface (SDK, CLI, API, docs) so an AI agent consumer can discover, invoke, and recover from it. The agent reads literally and has no intuition to repair ambiguity.; Avoid: conflating with harness engineering — AX is the surface you ship, not the agent's own scaffold.; Source: designing-for-agent-consumers
Boilerplate wrapper: A tool that mirrors an API endpoint one-to-one and returns its raw response. Shaped for the API, not the agent — the default mistake tool engineering corrects.; Source: tool-engineering

Schema & Description

Selection signal: The "use this when X, prefer over other_tool when Y" guidance in a description that lets the agent choose between plausible tools. An instruction to the agent, not documentation of the interface.; Avoid: a bare capability statement — accurate about what the tool does but silent on when to prefer it.; Source: tool-description-quality
Onboarding description: A tool description written as if training a competent new hire: explicit about implicit context — domain terms, ID formats, valid filter values, resource relationships — that a terse API reference omits.; Source: tool-descriptions-as-onboarding
Poka-yoke · mistake-proofing: Redesigning the schema so the wrong call cannot be made: enumerated values over free text, bounded ranges with defaults, prerequisite gates (read-before-write), unambiguous names (user_id, not user).; Avoid: "validation" — validation rejects bad input at runtime; poka-yoke removes it from the interface entirely.; Source: poka-yoke-agent-tools
Tool altitude: How much of the call sequence a description prescribes. Too low (hard-coded steps) is brittle when the task varies; too high (vague) burns tokens resolving ambiguity. State requirements and constraints; leave sequencing to the agent.; Source: tool-minimalism

Output & Shaping

Context injection: The framing that every tool result enters the context window and competes for attention. The design question becomes "what does the agent need to decide next?", not "what can I return?"; Source: token-efficient-tool-design
Paragraph heuristic: The sizing rule of thumb: tool output should fit in a paragraph. If it doesn't, the tool returns too much (filter it), the task truly needs it (structure it), or it's re-readable bulk (offload it to a file).; Source: token-efficient-tool-design
Semantic output: Returning natural-language fields the agent can reason over — names and formatted values — instead of opaque identifiers (UUIDs, MIME types). Reduces hallucination in retrieval tasks by removing transcription risk.; Avoid: developer-convenience output — the raw record with internal IDs and debug fields; gate that behind a debug mode.; Source: semantic-tool-output
PARTIAL contract: The graceful-truncation shape for output that overflows the budget: a useful prefix + a structurally distinct marker (a leading banner, not trailing prose) + a continuation handle (offset, cursor, page token). All three, or it collapses to a hard error.; Avoid: a trailing [PARTIAL] line — the model reads the prefix as complete and the line as commentary.; Source: graceful-tool-output-truncation
Response-format enum: A parameter (concise / detailed) that lets the agent pick output depth against its current context budget, instead of the tool always returning full or minimal.; Source: semantic-tool-output

Errors

Actionable error: An error that diagnoses the specific problem and names the fix ("end_date must be after start_date. Received…"), so the agent self-corrects on the next call. The error is part of the tool description in practice.; Avoid: opaque codes and tracebacks (400 Bad Request) — they tell the agent nothing it can act on.; Source: semantic-tool-output
RFC 9457 · problem+json: The application/problem+json standard for machine-readable HTTP errors — five base fields plus extensions (retryable, retry_after, error_category) that map directly to agent control-flow branches: retry, escalate, or fail fast.; Source: rfc9457-machine-readable-errors
Error preservation: Keeping a failed call and its error in context as a negative example that steers the model off the dead end. Preserve during recovery; compact after success. Stripping reasoning traces measured a ~30% performance drop.; Avoid: "cleaning up" errors mid-task — that's deleting the guardrail.; Source: error-preservation-in-context
Doom loop: The same error repeating 3+ times. The signal to stop preserving, compact, and change strategy — not to keep retrying the same call.; Source: error-preservation-in-context

Managing the Set

Tool consolidation: Folding tools that are always called together — or where one's output always feeds another — into a single tool that maps to one human-understandable action. Reduces selection ambiguity and context footprint.; Avoid: over-consolidation into a black box that hides which step failed, costing failure granularity.; Source: consolidate-agent-tools
Tool overlap: Two tools whose functions you can't distinguish in one sentence. The real driver of selection errors — producing redundant calls and wrong-tool choices — more than raw tool count.; Source: tool-minimalism
Namespace prefix: A shared prefix (asana_search, asana_task_create) that signals related tools operate on the same system, reducing confusion when several genuinely distinct tools must coexist.; Source: consolidate-agent-tools
High-level prompting: Defining the outcome and constraints rather than a step-by-step procedure. Prescriptive sequences are brittle when the task varies; the model has information the prompt author didn't. The complement to a minimal toolset.; Avoid: prescriptive scaffolding for capable models — reserve it for weaker models or compliance-driven, audit-trail work.; Source: tool-minimalism

Reliability Under Failure

Idempotency · safe-to-retry: The property that running an operation twice leaves the same end state as running it once. Agents fail mid-task and get re-run with no memory of the first run, so idempotency is what keeps a retry from duplicating a branch, a comment, or a charge.; Source: idempotent-agent-operations
Check-before-act: The foundational idempotency technique: one read to test for existing state before any write. Generalises to upsert over create and keying every artifact on a unique identifier (issue number, SHA) so it is findable rather than re-created.; Avoid: a single workflow-level guard — it skips unfinished work on a partial run; guard each artifact instead.; Source: idempotent-agent-operations
Idempotency log: For effects that can't be probed for existence — charges, emails, webhooks — a record written against a unique key before executing and checked before re-executing. The log is the dedup record the resource itself doesn't provide; it must outlive the worst-case retry horizon.; Source: idempotent-agent-operations
Tool annotation: An advisory flag on an MCP tool — readOnlyHint, destructiveHint, idempotentHint, openWorldHint — that the harness may read to make scheduling or gating decisions. Metadata, not a guarantee.; Avoid: trusting it unaudited — the spec treats annotations as untrusted unless the server is; a mis-marked mutating tool races under parallel dispatch.; Source: read-only-hint-concurrency
Hint-driven concurrency · sum-to-max: A harness running honest readOnlyHint tools in parallel, collapsing the wall-clock cost of N independent reads from sum(latency) toward max(latency). Near-zero to wire (a static annotation lookup), but load-bearing on annotation honesty.; Source: read-only-hint-concurrency

Scaling the Set

Eager vs. just-in-time loading · alwaysLoad: The per-server choice between keeping a tool's definitions in the prompt prefix every turn (eager) and deferring them behind a tool-search step (JIT). Classify on hit rate, definition size, latency tolerance, and description quality; keep the 3–5 hottest tools eager.; Source: mcp-eager-vs-jit-loading
Selection cliff: The point — roughly 30–50 visible tools — past which tool selection accuracy degrades significantly, as the correct tool gets harder to find among distractors. The reason discoverability, not just consolidation, matters at scale.; Source: mcp-eager-vs-jit-loading
Search-miss: The silent failure of deferred loading: a tool whose description doesn't match the model's natural search terms becomes invisible, and the agent reports "no tool available" when the tool exists. Audit description craft before deferring.; Source: mcp-eager-vs-jit-loading
MCP primitive: One of the three things an MCP server exposes: a tool (model-invoked action), a resource (application-attached read-only context), or a prompt (user-triggered workflow). Picking the wrong one creates friction before naming or schema matter.; Avoid: exposing static read-only context as a tool — that forces a selection decision the agent shouldn't have to make.; Source: mcp-server-design
Structured content · outputSchema: Validated, schema-typed output a server returns alongside serialized JSON, so the agent relies on a known shape rather than parsing free text. Pairs with the two MCP error channels: protocol errors for the client, isError tool errors for the agent.; Source: mcp-server-design
Catalog token tax: The standing token cost of tool definitions, paid every turn before any call because the definitions live in context. A large catalog can run to tens of thousands of tokens; deferring the long tail and trimming schemas is how you stop paying it.; Source: mcp-server-design
Budget the task, not the call: The cost-budgeting rule: the locally cheapest call can be the most expensive across a task. An over-trimmed output forces a second round-trip; parallel dispatch against a rate-limited backend burns recovery turns. Price the loop, not the single call.; Source: read-only-hint-concurrency