Part 3 · The Deeper Protocol

MCP Server Design · ~7 min

Code Mode

When a workflow chains twenty tool calls and most of the data never matters, the fix isn't a better schema — it's letting the agent write code that does the chaining and hands back only the answer.

Why this, for you: the single biggest token lever for large result sets isn't trimming a field — it's keeping intermediate results out of context entirely. Programmatic calling lets the agent orchestrate many of your tools in a sandbox and return only the final output, which changes how you'd shape a server meant to scale.

You've kept the catalog small (Lesson 5) and your returns scoped (Lesson 2). But some tasks still chain many calls whose intermediate data is pure noise. Programmatic tool calling — "Code Mode" — is the pattern for those: the agent writes code that orchestrates the calls and only the final result enters its context.

1 From a round-trip loop to one code block

Normally the model makes one tool call, reads the result, makes the next — a round-trip per call, every intermediate result landing in context. Code Mode replaces the loop with a single sandboxed program: the model writes code that calls your tools, processes the results in the sandbox, and prints only what matters.

# the model writes code that orchestrates many calls in one pass team = await get_team_members("engineering") budgets = await asyncio.gather(*[get_budget_by_level(l) for l in levels]) expenses= await asyncio.gather(*[get_expenses(m["id"], "Q3") for m in team]) # only the filtered result is printed — everything else stays in the sandbox print(json.dumps(over_budget))

Intermediate results are processed in the sandbox and never enter the model's context; only stdout from the code returns. For a data-heavy chain, that's the difference between paying for thousands of records and paying for one line.

2 The biggest token lever for large result sets

This is the search-and-execute idea at its extreme. Cloudflare's reference server fronts roughly 2,500 API endpoints with just two toolssearch and execute — in about 1K tokens, then uses code orchestration so a task like "enable DNSSEC on every zone where it's off" loops in the sandbox and returns only the changed zones instead of pulling thousands of records into the window.

ApproachTokens (≈2,500 endpoints)
All tools loaded upfront~1,170,000
Tool search (top-k matching)~8,700
Code Mode (typed SDK + 2 tools)~1,000
On complex multi-step research, programmatic calling cut tokens 37% (43.6K → 27.3K) and eliminated 19+ inference passes by orchestrating 20+ calls in one block. It's a different lever than tool search: search shrinks definitions; Code Mode shrinks results.

3 When Code Mode is the wrong tool

The sandbox is what makes it cheap, and also what bounds where it fits. Four conditions flip it:

It layers on tool search, not replaces it

These solve different bottlenecks: tool search for definition bloat, Code Mode for result bloat. A large remote server often wants both — defer the catalog behind search, then let the agent code-orchestrate the calls it discovers. Verify sandbox availability before you design a server to depend on it.

↪ Your win: orchestrate in the sandbox, return the answer

Retrieval practice — recall, don't peek

Question 1In programmatic tool calling, what enters the model's context is…

Question 2Code Mode's main token win comes from shrinking…

Question 3Programmatic calling is NOT available when you need…

Question 4Relative to tool search, Code Mode is a lever on…

Question 5 · spaced recall from Lesson 7An MCP sampling/createMessage request flows…

Ask me anything. Want to judge whether a specific workflow is a Code Mode fit, or check that a server you're designing exposes tools that code-orchestrate cleanly? Next: the Capstone — every decision in this course sequenced into one server you can ship.
✎ Feedback