Capstone — The Tool Engineer's Decision Table

Fourteen lessons, one habit: read the symptom, find the surface, apply the move. Here it is as a single routing table — now spanning reliability, scale, cost, and the surfaces beyond the catalog — plus a mixed review of the whole course.

Why this, for you: the page to keep open while you design. Every recurring agent-tool failure maps to a surface and a move you now know. This capstone collapses the course into a diagnostic you can run in seconds against a real tool — and then tests that you've internalised it.

The through-line of the whole course: agent quality is bounded by tool quality, and the bound is set by the surface, not the implementation. When an agent misuses a tool, you don't reach for the prompt — you read the symptom off the surface and apply the matching move.

1 The decision table

Match the symptom to the surface, then to the fix. Each row is one lesson.

Symptom you observe	Surface	The move
Agent picks the wrong tool, or doesn't find the right one	Name & description	Add selection signals: "use when X, prefer over Y when Z" (L2)
Agent passes malformed or out-of-range parameters	Schema	Poka-yoke: enums, bounded ranges + defaults, prerequisite gates (L2)
Context fills fast; quality drops mid-session	Output volume	Return only the next decision's inputs; paragraph heuristic (L3)
Agent hallucinates field names or miscopies IDs	Output shape	Semantic names over UUIDs; only decision-relevant fields (L4)
Big results overflow or get silently truncated	Output overflow	PARTIAL: prefix + leading marker + continuation handle (L4)
Agent retries a failing call or gives up	Errors	Diagnose + direct; RFC 9457 fields; preserve the trace (L5)
Agent hesitates between tools or chains many calls	The set	Consolidate overlap; namespace; prompt to outcomes (L6)
A re-run after failure duplicates branches, comments, or charges	Effects	Check-before-act, upsert, unique keys; idempotency log for side effects (L7)
Parallel reads race, or a "read" tool quietly mutates state	Annotations	Honest `readOnlyHint`/`idempotentHint`; audit before trusting (L8)
Selection degrades as the catalog grows; "tool not available"	Discoverability	Eager-load the 3–5 hot tools; defer the rest behind tool search (L9)
Wrong primitive, opaque names, unstructured server output	MCP exposure	Tool vs resource vs prompt; `verb_noun`; `outputSchema` (L10)
Calls are correct but the toolset is slow or expensive	Cost & latency	Budget the catalog tax; overlap reads to `max`; clear results (L11)
Agent misuses a correct tool; the gap is usage knowledge a schema can't carry	Skill packaging	Package knowledge (not behavior); description gate + Gotchas; fork heavy work (L12)
A rule must hold whatever the model decides — never push main, use pnpm	Lifecycle enforcement	Hook it, don't prompt it: `PreToolUse` + exit 2; cover substitution paths (L13)
The typed catalog is large but the model already knows the CLI	Tool interface	Collapse toward one `run()`; split execution from presentation; guard with a hook (L14)

2 Three tensions to hold

The course isn't a list of "always do X" rules — its levers pull against each other, and engineering is picking the right point on each axis.

Completeness vs. economy. Richer descriptions and outputs prevent misuse — but every token is paid on every call. Consolidation vs. granularity. Fewer tools cut selection ambiguity — but a merged black box hides which step failed. Eager vs. deferred. Keeping a tool in-context saves a discovery round-trip — but past ~30–50 visible tools it dilutes selection. Enforce vs. guide. A hook makes a rule absolute — but it sees parameters, not intent, so anything contextual belongs in the prompt. There is no universal setting; each depends on whether the tool is durable, shared, and frequently hit.

The unifying test

For almost every decision in this course, one question routes it: does this make the agent's correct next action easier to take? If a field, a tool merge, an annotation, or an error rewrite makes the next step clearer or safer, keep it. If it adds tokens, ambiguity, or a race without making the next action easier, cut it. "Fix the interface, not the prompt" terminates — the prompt-patch loop doesn't.

3 Where this is overhead, not investment

The whole discipline assumes a stable, shared tool called across many sessions. That's where engineering the surface pays back. It's overhead — sometimes net-negative — for one-off exploratory scripts, tools wrapping a well-documented API the model already has strong priors for, or interfaces still changing every sprint, where heavy docs drift from reality and mislead more than a terse stub. The reliability and scaling moves carry the same caveat: idempotency guards, honest annotations, and defer-vs-eager tuning earn their keep on durable, high-traffic tools and are ceremony on a throwaway one. Engineer durable surfaces; keep throwaway ones thin.

↪ Your win: the whole discipline, in one habit

Read the symptom, find the surface, apply the move — the decision table is the routing.
Selection signals + poka-yoke get the right tool called with the right arguments.
Size then shape output: next-decision inputs, semantic fields, graceful overflow.
Errors teach: diagnose, direct, structure, and preserve the failure during recovery.
Make effects safe: idempotent re-runs, honest annotations, no silent mutation.
Right-size and right-load the set, and budget its cost across the task.
Reach beyond the catalog: package usage knowledge as a skill, enforce non-negotiables with a hook, and collapse toward a CLI where the model already knows it — then ask: does it make the agent's next action easier?

Mixed review — the whole course, recall don't peek

Question 1 · from Lesson 01"Agent quality is bounded by tool quality" tells you to fix a misuse by changing the…

Question 2 · from Lesson 04 & 05A tool returning a 40-field run object with raw UUIDs, then a bare 500 on failure, breaks two rules — shape the output and…

Question 3 · from Lesson 07The first technique that makes a re-run after failure safe is to…

Question 4 · from Lesson 08A harness that dispatches readOnlyHint: true tools in parallel is only safe once you have…

Question 5 · from Lesson 09 & 11Past roughly thirty to fifty visible tools, eager-loading the whole catalog mainly costs you…

Question 6 · from Lesson 12, 13 & 14Match the surface beyond the catalog: usage knowledge a schema can't carry, a rule that must hold whatever the model decides, and a model that already knows the CLI go to…

Ask me anything. Bring a real tool — paste its name, schema, a sample output, and an error — and we'll run the decision table against it together, surface by surface. Or ask which single move would most improve a toolset you're already shipping, from selection all the way to its cost budget.