Part 3 · Managing the Set

Tool Engineering · ~7 min

Consolidation vs Sprawl

More tools feels like more capability. To the agent it's more reasoning tax — and past a point, measurably worse accuracy. The fix is fewer, well-scoped tools.

Why this, for you: the lesson that fixes a whole toolset at once, not one tool. If your agent hesitates between two search tools or chains five calls for one logical action, the problem is the shape of the set. Consolidating it is a structural performance win that also frees context for the actual task.

The default mistake is to mirror the API: one tool per endpoint, one per operation. That produces a large set where the agent must chain calls for a single logical action — and must reason, at each step, about which tool comes next. Every one of those decisions is an opportunity for the wrong choice.

More tools do not improve agent outcomes. Expanding a tool catalog caused accuracy drops of 7–85% depending on the model (LongFuncEval), driven by a lost-in-the-middle effect: the correct tool gets harder to find among distractors. Consolidation removes the distractors at the source.

1 The consolidation principle

Each tool should map to one distinct, human-understandable sub-task. The test is mechanical:

# Sprawl — agent chains five, deciding the order each time search_flights · get_flight_details · hold_flight · create_booking · send_confirmation # Consolidated — two clear intents find_flights # search + details (always called together) book_flight # hold + book + confirm (never split in practice)

Now the agent selects between "find" and "book" instead of reasoning about a five-step pipeline.

2 Overlap is the enemy, not count

The real cost driver is overlapping function, which produces two failure modes: redundant calls (the agent calls both when one would do) and wrong selection (it picks the less appropriate one because the distinction is unclear). OpenAI's data-agent team found exposing their full tool set was "confusing to agents"; consolidating and restricting — even removing valid options — improved end-to-end reliability. When multiple related tools are genuinely necessary, group them under a namespace prefix (asana_search, asana_task_create) so the relationship is explicit.

Consolidate, but watch the parameter surface

Collapsing tools only helps if the agent can still choose reliably. A unified search with mode={text,semantic,symbol} wins only if the model picks the mode reliably — otherwise you've moved the ambiguity from tool choice into parameter choice and gained nothing. Same with a generic do_action(system, verb, payload): it pushes selection into parameter space and discards the per-tool schemas that made each call legible.

3 When NOT to consolidate

Over-consolidation has its own failures. A merged tool that does too much becomes a black box: when find_and_book_flight silently fails at the hold step, it looks identical to a failure at confirmation, and the agent can't reason about which step broke. Keep tools separate when they:

Don't merge if the tools…Because…
Serve distinct sub-tasks not always done togetherForcing a merged call wastes tokens and obscures intent
Have different permission requirementsMerging grants excess access to every caller
Have wildly different output schemasThe merged response becomes incoherent to pattern-match

The test: does the merged tool still map to one clear human action? If describing it takes a paragraph, it's over-consolidated. If two sub-tasks are sometimes called together but not always, keep them separate and let the agent compose.

4 High-level prompting is the other half

Minimal tools pair with goal-oriented prompting. Prescriptive step-by-step instructions anchor the agent to one procedure that breaks when the task varies — "rigid instructions often pushed the agent down incorrect paths." Define the outcome and constraints, not the steps; the model has information about what it found that the prompt author didn't. (Weaker models are the exception — they benefit more from procedural scaffolding.)

↪ Your win: audit the set, not just the tools

Retrieval practice — recall, don't peek

Question 1Two tools that are always called together should be…

Question 2The real driver of selection errors in a large toolset is…

Question 3Over-consolidating into one black-box tool mainly costs you…

Question 4A unified search with a mode parameter only helps if…

Question 5 · spaced recall from Lesson 05An agent-facing error message should, above all,…

Ask me anything. Want the over-vs-under-consolidation decision rule for a specific toolset, or how toolset-agentization (grouping co-used tools into a sub-agent) differs from merging? Next, Part 4 opens with Idempotent, Safe-to-Retry Tools — the surface that decides whether a re-run heals or duplicates.
✎ Feedback