Tool Engineering · ~7 min
The whole course assumed a catalog of typed tools. This lesson considers the opposite extreme: hand the agent one run(command) tool and let Unix supply the rest.
Most frameworks register many typed tools — read_file, search_code,
list_directory — each with a schema, an error format, and a slot in the catalog. The alternative collapses
all of them into one execution primitive and lets the agent compose shell commands directly.
run(command) tool can replace a large typed catalog by exploiting the model's
dense pretraining on shell usage. It's the extreme end of consolidation: where Lesson 6 merged overlapping tools, the
single-tool design eliminates tool selection entirely — there is only ever one tool to pick.The catalog gives you discovery, error handling, and composition as bespoke per-tool work. Unix already has all three, and the model already knows them from pretraining:
| What the catalog builds per tool | What Unix supplies for free |
|---|---|
| Tool descriptions for discovery | --help, man — lazy discovery on demand, no upfront schema |
| Custom error format per tool | stderr + exit codes — "command not found" routes the next action |
| Pre-built consolidation | pipes, &&, || — search, filter, transform in one call |
The agent runs gh --help to learn capabilities instead of loading a schema; reads pull request not
found on stderr and adjusts; and chains a query with jq in a single invocation. None of it needed a
bespoke tool definition — which is the catalog token tax from Lesson 11 dropping to near zero.
Raw shell access is a loaded gun pointed at the context window. A kubectl get pods returns a page; a PNG
returns binary garbage that fills the window with uninterpretable bytes. The fix is a two-layer
architecture: the agent works in raw CLI, and a presentation layer handles what the agent should not.
| Presentation guard | What it prevents |
|---|---|
| Binary guard | Non-text output (a PNG) poisoning context — return a placeholder |
| Overflow truncation | A huge result overflowing — preserve head and tail (the L4 PARTIAL shape) |
| Stderr attachment | Silent failure — surface stderr alongside stdout so the agent can route on it |
A second move pulls in the same direction: wrapper scripts that pre-filter at the source. Instead of
returning the full pod table, a check-pods.sh returns only the non-running pods as JSON. This is Lesson 3's
paragraph heuristic and Lesson 4's result-shaping, applied to CLI output — "return only what the next decision needs",
enforced by the script rather than the schema. Anthropic measured a related filter-before-return pattern cutting a
workload from 150,000 to 2,000 tokens.
This isn't all-or-nothing. The single-tool design and the typed catalog are two ends of one axis, and most production systems land between them. Pick by what the task actually needs:
Single run(command) wins | Typed tools win |
|---|---|
| High pretraining alignment — model knows the CLI | Strong parameter constraints — enums, bounded ranges |
| Composition via pipes is the natural shape | High-security surfaces needing per-tool gating |
Discovery is cheap (--help exists) | Multimodal payloads — images, audio |
The honest middle: five well-designed tools plus shell access captures most of the benefit without
unrestricted-execution risk. And the catalog's parameter constraints aren't free to give up — a free-form command string
is the opposite of the poka-yoke from Lesson 2. When you do ship CLIs for agents, design them for machine consumption:
a --json flag, distinct exit codes, --dry-run, and --yes/--force to
kill interactive prompts an agent has no stdin to answer.
One run(command) tool means arbitrary execution — the broadest possible
security surface, where each typed tool was constrained by construction. This is exactly where Lesson 13's hooks earn
their place: a PreToolUse matcher on the Bash / run tool is the deterministic
gate that bounds what an otherwise-unbounded primitive may do. The single-tool design and the hook are complements —
the CLI gives reach, the hook draws the line.
run(command) tool exploits shell pretraining and eliminates tool selection — the far end of consolidation.--help, stderr + exit codes, pipes.PreToolUse hook on the run tool.Retrieval practice — recall, don't peek
Question 1A single run(command) tool works because models have dense pretraining on…
Question 2The Unix mechanism that gives lazy, on-demand discovery without loading a schema is…
Question 3The presentation layer's binary guard exists to stop non-text output from…
Question 4The practical middle of the spectrum, capturing most of the upside, is…
Question 5 · spaced recall from Lesson 13The deterministic gate that bounds what an arbitrary-execution tool may do is a…
run() tool you're considering? Next, the Capstone —
every surface and mechanism from the course in one decision table, plus a mixed review spanning selection to
enforcement.