The Lethal Trifecta

An agent isn't dangerous because of one capability. It's dangerous when three meet on the same path — and you can name them.

Why this, for you: almost every agent data-exfiltration incident reduces to the same three legs arriving together. Learn to spot them and you can audit any agent you build in two minutes — and know which leg to cut. This is the lens the rest of the course hangs on.

An LLM cannot reliably tell trusted instructions from injected ones (you'll see why in Lesson 2). So defending an agent is not about making the model smarter — it's about making sure no single execution path holds all three of the capabilities that turn a confused model into a data leak.

1 Name the three legs

Simon Willison's lethal trifecta names three capabilities that are each harmless alone but catastrophic together on one path:

Leg	What it means	Examples
Private data	Secrets, credentials, PII, or proprietary code in scope	`.env`, DB connections, internal repos
Untrusted input	Content the agent did not author and can't fully trust	PR comments, issues, fetched pages, dependencies
External egress	Ability to send data outside the sandbox	HTTP tools, MCP servers with outbound calls

Risk requires all three legs at once. Untrusted input carries the attacker's instruction; private data is the payload; egress is the exit. Remove any one and the exfiltration path closes.

2 Audit per path, not per agent

The unit of analysis is the execution path, not the agent as a whole. One agent can run several paths; only the path that holds three "Yes" values is unsafe.

Execution path	Private?	Untrusted?	Egress?	Safe?
Code-review agent (no network)	Yes	Yes	No	Yes
Research agent (web, no secrets)	No	Yes	Yes	Yes
Deploy agent with env vars + repo config	Yes	Yes	Yes	No
Internal codegen (controlled input)	Yes	No	Yes	Yes

A single path with three "Yes" values demands an architectural fix, not a better prompt.

3 Remove egress first

Which leg to cut depends on the task — but for coding agents the cheapest cut is almost always egress, because most coding tasks need no outbound network at all.

# Default-deny outbound — the model cannot override this docker run --network none agent-image

This is a deterministic control: it runs below the model, so no injected instruction can talk its way past it. OpenAI ships the same idea as Lockdown Mode — outbound requests capped with no AI in the decision loop.

A real chain: the poisoned dependency

NVIDIA documented it end-to-end: an agent reads a GitHub issue naming a malicious pip package (untrusted input), installs it (egress), and the package exfiltrates env vars (private data). Three legs, one path. The fix isn't a smarter model — it's removing the egress leg.

Removing a leg migrates risk — it doesn't erase it

The trifecta is a structural heuristic, not a guarantee. Tokenizing PII shifts the attack to the token resolver; sandboxing egress shifts it to sandbox-escape. Each removed leg creates a new high-value target that must itself be hardened. And "read-only egress" or "partially tokenized data" sit between present and absent — binary Yes/No columns can produce false confidence.

↪ Your win: a two-minute trifecta audit

List every execution path your agent can run, not just "the agent."
Mark three boxes per path: private data? untrusted input? egress?
Three Yes = unsafe. Fix it architecturally — don't reach for a prompt.
Cut egress first for coding agents (--network none); most tasks never miss it.
Re-audit the new target the removed leg created — the resolver, the proxy, the sandbox edge.

Retrieval practice — recall, don't peek

Question 1The three legs of the lethal trifecta are…

Question 2To close the exfiltration path, you need to…

Question 3The correct unit of a trifecta audit is the…

Question 4For most coding agents, the cheapest leg to remove is…

Question 5Removing a leg, e.g. tokenizing PII, mainly…

Ask me anything. Want to run a trifecta audit on an agent you're building, or see how the six design patterns (Dual LLM, Plan-Then-Execute, CaMeL) each map to a specific leg? Next in Part 1: The Provenance-Blind Model — why prompt injection works at all.