Ship a Server Agents Can Drive

Eight lessons, one decision sequence. Here's the whole course as a table you run top to bottom, plus a mixed review that pulls from every part.

Why this, for you: the individual choices each have a defensible answer — the production question is the sequence, because each decision forecloses the next. This capstone is the order to resolve them in, so a real server lands safe, discoverable, and cheap to keep installed.

A well-designed MCP server makes the right tool call obvious, costs little to keep loaded, and can't be onboarded into a trifecta. Those three properties come from resolving the decisions in order — each one locks the option space for the next. The first six shape the core surface; the last three are the deeper protocol you reach for once it scales.

1 The decision table

#	Decision	Resolve it by…	From
1	Primitive	Tool (model acts), resource (client attaches read-only context), or prompt (user triggers workflow)	L1
2	Transport	stdio for local dev tooling; Streamable HTTP for shared/remote — and remote forces OAuth	L1
3	Data exposure	Expose corpora behind search/read tools; return only what the next step needs; guard retrieval quality	L2
4	Tool craft	verb_noun ≤32 chars; enums + defaults + examples; negative guidance; actionable errors	L3
5	Security	Audit each path for the trifecta; remove a leg (egress first); scope credentials; protect config	L4
6	Load & discovery	Keep catalog <15 tools; eager-load below 10/10K, else defer; search-friendly descriptions; name for intent	L5
7	Output & hints	Declare `outputSchema` + return structuredContent; annotate behavior — but don't trust hints from untrusted servers	L6
8	Server-initiated	Elicit mid-call inputs (flat fields); sample for model reasoning — client picks the model, user approves each request	L7
9	Result orchestration	Code Mode for data-heavy chains — only `stdout` returns; not for in-between reasoning, and not ZDR-eligible	L8

The reference extreme: Cloudflare exposes ~2,500 API endpoints through two tools — search and execute — in roughly 1K tokens. Every layer lines up: remote server → intent grouping at its limit → schemas needn't be deferred → programmatic calling for large results → OAuth on auth.

2 The author's checklist

# run this before you ship [ ] Each tool is verb_noun snake_case, ≤32 chars, no versions [ ] Every param has a description with constraints and examples [ ] Enums + defaults + additionalProperties:false used where possible [ ] Descriptions say when NOT to use the tool [ ] Errors carry the violation, the constraint, and recovery context [ ] Read-only context is a resource, not a tool [ ] Tool list is under 15 tools per server [ ] Responses return only what the agent needs next [ ] Clear server instructions so tool search finds you [ ] outputSchema declared; structuredContent returned with a JSON copy [ ] Annotations honest; never trusted from an untrusted server [ ] Elicitation/sampling used only where needed, behind their gates [ ] Code Mode considered for data-heavy chains (sandbox available) [ ] No execution path holds all three trifecta legs

3 Where the checklist inverts

The checklist assumes a stable, internally-owned API. Know the conditions that flip each rule:

Enums vs. evolving upstreams — a thin string type is more durable when the upstream adds values often.
Schemas don't sanitize input — the stdio model can run commands on injected args; argument sanitization is the fix, not a richer schema.
Over-consolidation hurts routing — one polymorphic tool pushes disambiguation into the schema; the ceiling depends on description distinctness, not raw count.
An unavoidable trifecta — when a leg can't be removed, add compensating controls: output scanning, rate-limiting, egress anomaly detection.
Annotations as a security control — a readOnlyHint from an untrusted server is a claim, not a sandbox; gate writes and egress regardless.
Code Mode without a sandbox — it's inert air-gapped or on-prem, loses in-between reasoning, and is not ZDR-eligible; keep the round-trip loop there.

↪ Your win: the whole discipline, in order

Primitive → transport → data → tool craft → security → discovery — resolve in sequence.
Make the right call obvious with names, schemas, examples, and negative guidance.
Gate onboarding — no path holds all three legs; remove egress first.
Stay affordable — small catalog, eager-vs-JIT by hit rate, search-friendly prose.
Type and gate the deeper protocol — output schemas, honest hints, gated server-initiated calls, Code Mode for result bloat.
Know where the rules invert before you trust the checklist blindly.

Mixed review — every part in play

Question 1 · L1A reusable multi-step workflow a user triggers is best modeled as a…

Question 2 · L3Which error message lets an agent self-correct without a human?

Question 3 · L4A deploy agent holds env vars, reads repo config, and posts externally. It is…

Question 4 · L6A readOnlyHint from an untrusted MCP server should be treated as…

Question 5 · L8 · spaced recallIn Code Mode, what reaches the model's context is…

Ask me anything. Bring a server you're designing and we'll run the decisions top to bottom — primitive, transport, data exposure, tool craft, trifecta audit, load policy, output and annotations, server-initiated calls, and result orchestration — and produce the author's checklist filled in for your case.

Ship a Server Agents Can Drive

1 The decision table

2 The author's checklist

3 Where the checklist inverts

↪ Your win: the whole discipline, in order

Go deeper