When a workflow chains twenty tool calls and most of the data never matters, the fix isn't a better
schema — it's letting the agent write code that does the chaining and hands back only the answer.
Why this, for you: the single biggest token lever for large result sets isn't trimming a field —
it's keeping intermediate results out of context entirely. Programmatic calling lets the agent orchestrate many of
your tools in a sandbox and return only the final output, which changes how you'd shape a server meant to scale.
You've kept the catalog small (Lesson 5) and your returns scoped (Lesson 2). But some tasks still chain
many calls whose intermediate data is pure noise. Programmatic tool calling — "Code Mode" — is the
pattern for those: the agent writes code that orchestrates the calls and only the final result enters its context.
1 From a round-trip loop to one code block
Normally the model makes one tool call, reads the result, makes the next — a round-trip per call, every intermediate
result landing in context. Code Mode replaces the loop with a single sandboxed program: the model writes code that
calls your tools, processes the results in the sandbox, and prints only what matters.
# the model writes code that orchestrates many calls in one pass
team = await get_team_members("engineering")
budgets = await asyncio.gather(*[get_budget_by_level(l) for l in levels])
expenses= await asyncio.gather(*[get_expenses(m["id"], "Q3") for m in team])
# only the filtered result is printed — everything else stays in the sandboxprint(json.dumps(over_budget))
Intermediate results are processed in the sandbox and never enter the model's context; only
stdout from the code returns. For a data-heavy chain, that's the difference between paying for thousands of
records and paying for one line.
2 The biggest token lever for large result sets
This is the search-and-execute idea at its extreme. Cloudflare's reference server fronts roughly
2,500 API endpoints with just two tools — search and execute — in about 1K
tokens, then uses code orchestration so a task like "enable DNSSEC on every zone where it's off" loops in the sandbox
and returns only the changed zones instead of pulling thousands of records into the window.
Approach
Tokens (≈2,500 endpoints)
All tools loaded upfront
~1,170,000
Tool search (top-k matching)
~8,700
Code Mode (typed SDK + 2 tools)
~1,000
On complex multi-step research, programmatic calling cut tokens 37% (43.6K → 27.3K)
and eliminated 19+ inference passes by orchestrating 20+ calls in one block. It's a different lever
than tool search: search shrinks definitions; Code Mode shrinks results.
3 When Code Mode is the wrong tool
The sandbox is what makes it cheap, and also what bounds where it fits. Four conditions flip it:
The model must reason over intermediate results — the sandbox returns only stdout,
so any reasoning that needed to see the in-between data is lost.
A few well-defined calls in fixed order — the sandboxing and container overhead isn't justified
for a short, predictable sequence.
Cold starts hurt — containers idle out after ~4.5 minutes, adding latency on the first call back.
You need Zero Data Retention — programmatic calling runs on the code-execution sandbox, which is
not ZDR-eligible. And it's inert anywhere there's no trusted sandbox at all (air-gapped, on-prem).
It layers on tool search, not replaces it
These solve different bottlenecks: tool search for definition bloat, Code Mode for
result bloat. A large remote server often wants both — defer the catalog behind search, then let the agent
code-orchestrate the calls it discovers. Verify sandbox availability before you design a server to depend on it.
↪ Your win: orchestrate in the sandbox, return the answer
Code Mode keeps intermediate results out of context — only stdout returns.
It's the result-set lever; tool search is the definition lever — they compose.
Reach for it on data-heavy chains of 3+ dependent or parallel calls.
Skip it when the model must reason over the in-between data, or for short fixed sequences.
Mind the constraints — cold starts, no ZDR, and it needs a trusted sandbox to exist at all.
Retrieval practice — recall, don't peek
Question 1In programmatic tool calling, what enters the model's context is…
Question 2Code Mode's main token win comes from shrinking…
Question 3Programmatic calling is NOT available when you need…
Question 4Relative to tool search, Code Mode is a lever on…
Ask me anything. Want to judge whether a specific workflow is a Code Mode fit, or check that a server
you're designing exposes tools that code-orchestrate cleanly? Next: the Capstone — every decision in this
course sequenced into one server you can ship.