Token-Efficient Tool Design

Every tool call is a deposit into the context window. A tool that returns 10,000 tokens when 200 would do has spent 10% of a 100K window on a single call.

Why this, for you: the win that compounds across a whole session. Oversized tool output doesn't just cost money — it buries the fields the agent needs in noise it has to attend to. Size output for the next decision and you keep the window full of signal, which is exactly what keeps a long agent run sharp.

Reframe the tool. It isn't a data-access function; it's a context injection. Whatever it returns enters the window and competes for attention with everything else there. The design question is no longer "what can I return?" but "what does the agent need to know to decide what to do next?"

Return only the next decision's inputs. A CI-status tool should return "3 checks passed, 1 failed: lint" — not the 40-field GitHub Actions run object with timestamps, URLs, and raw logs the agent will never read.

1 Why oversized output actively hurts

It's not only opportunity cost. Transformer self-attention computes pairwise relationships across every token, so irrelevant tokens compete with relevant ones for focus. Worse, oversized output strands the useful fields in the low-attention middle — the lost-in-the-middle effect. The agent's ability to act correctly on a field degrades the more noise surrounds it.

The full-passthrough refactor

A CI tool returning the raw run object costs ~400 tokens per call. Compute the summary at the tool layer — passed, and the names of any failures — and it drops to ~20. The agent reads "1 failed: lint" and runs the lint fixer immediately: no parsing, nothing discarded.

2 The paragraph heuristic

A useful rule of thumb: tool output should fit in a paragraph. If it doesn't, one of three things is true — and each has a different fix.

If the output is large because…	Then…
The tool returns more than the decision needs	Add filtering or summarisation at the tool layer
The task genuinely needs all of it	Load it once and structure it carefully (Lesson 4)
It's bulk the agent re-reads on demand	Write it to a file and return a reference + preview

Prefer IDs and summaries over full objects. Structured output — JSON with named fields, or concise prose — is easier for the agent to process than a raw API dump.

3 The definitions cost tokens too

Output isn't the only injection. Every tool definition sits in context on every turn. A typical multi-server MCP setup can consume ~55,000 tokens in tool definitions alone — before any task work begins. So token efficiency has two fronts: shrink what each call returns, and shrink the standing cost of the toolset itself. Precise descriptions help here too — an ambiguous one forces the agent to spend tokens resolving it before invoking (Lesson 2); a vague toolset taxes every decision (Lesson 6).

When filtering backfires

Over-filtering has its own failure modes. A summary that drops "unimportant" fields will eventually drop one a rare-but-valid path needs — and the agent can't ask for data it doesn't know exists. A bespoke summariser also breaks on every upstream schema change, and for a tool called once a session can cost more than it saves. Apply this where output is consistently large or the tool runs in a loop — measure context pressure before you build a summarisation layer.

↪ Your win: size every response to the next decision

Treat each tool result as a context injection — return the next decision's inputs, nothing else.
Apply the paragraph heuristic: if output overflows a paragraph, filter, structure, or offload to a file.
Prefer IDs and summaries over full objects; structured fields over raw API dumps.
Shrink the standing cost too — definitions for a bloated toolset can run tens of thousands of tokens before any work.
Don't over-filter blindly: protect rare-path fields, and only build a summariser where output is reliably large.

Retrieval practice — recall, don't peek

Question 1The right design question for tool output is…

Question 2Beyond cost, oversized output hurts because irrelevant tokens…

Question 3The rough sizing heuristic is that tool output should fit in…

Question 4Aggressively stripping fields backfires mainly because the agent…

Question 5 · spaced recall from Lesson 02The most common tool-description failure states what the tool does but not…

Ask me anything. Want the offload-to-scratch pattern (write big results to disk, return a 500-char preview), or how summarised output gets reused during compaction? Next in Part 2: Result Shaping — the toolbox for output that's still too big even after you've trimmed it.