Security · ~7 min
An agent isn't dangerous because of one capability. It's dangerous when three meet on the same path — and you can name them.
An LLM cannot reliably tell trusted instructions from injected ones (you'll see why in Lesson 2). So defending an agent is not about making the model smarter — it's about making sure no single execution path holds all three of the capabilities that turn a confused model into a data leak.
Simon Willison's lethal trifecta names three capabilities that are each harmless alone but catastrophic together on one path:
| Leg | What it means | Examples |
|---|---|---|
| Private data | Secrets, credentials, PII, or proprietary code in scope | .env, DB connections, internal repos |
| Untrusted input | Content the agent did not author and can't fully trust | PR comments, issues, fetched pages, dependencies |
| External egress | Ability to send data outside the sandbox | HTTP tools, MCP servers with outbound calls |
The unit of analysis is the execution path, not the agent as a whole. One agent can run several paths; only the path that holds three "Yes" values is unsafe.
| Execution path | Private? | Untrusted? | Egress? | Safe? |
|---|---|---|---|---|
| Code-review agent (no network) | Yes | Yes | No | Yes |
| Research agent (web, no secrets) | No | Yes | Yes | Yes |
| Deploy agent with env vars + repo config | Yes | Yes | Yes | No |
| Internal codegen (controlled input) | Yes | No | Yes | Yes |
A single path with three "Yes" values demands an architectural fix, not a better prompt.
Which leg to cut depends on the task — but for coding agents the cheapest cut is almost always egress, because most coding tasks need no outbound network at all.
This is a deterministic control: it runs below the model, so no injected instruction can talk its way past it. OpenAI ships the same idea as Lockdown Mode — outbound requests capped with no AI in the decision loop.
NVIDIA documented it end-to-end: an agent reads a GitHub issue naming a malicious pip package (untrusted input), installs it (egress), and the package exfiltrates env vars (private data). Three legs, one path. The fix isn't a smarter model — it's removing the egress leg.
The trifecta is a structural heuristic, not a guarantee. Tokenizing PII shifts the attack to the token resolver; sandboxing egress shifts it to sandbox-escape. Each removed leg creates a new high-value target that must itself be hardened. And "read-only egress" or "partially tokenized data" sit between present and absent — binary Yes/No columns can produce false confidence.
--network none); most tasks never miss it.Retrieval practice — recall, don't peek
Question 1The three legs of the lethal trifecta are…
Question 2To close the exfiltration path, you need to…
Question 3The correct unit of a trifecta audit is the…
Question 4For most coding agents, the cheapest leg to remove is…
Question 5Removing a leg, e.g. tokenizing PII, mainly…