Part 1 · The Threat Model

Security · ~7 min

The Lethal Trifecta

An agent isn't dangerous because of one capability. It's dangerous when three meet on the same path — and you can name them.

Why this, for you: almost every agent data-exfiltration incident reduces to the same three legs arriving together. Learn to spot them and you can audit any agent you build in two minutes — and know which leg to cut. This is the lens the rest of the course hangs on.

An LLM cannot reliably tell trusted instructions from injected ones (you'll see why in Lesson 2). So defending an agent is not about making the model smarter — it's about making sure no single execution path holds all three of the capabilities that turn a confused model into a data leak.

1 Name the three legs

Simon Willison's lethal trifecta names three capabilities that are each harmless alone but catastrophic together on one path:

LegWhat it meansExamples
Private dataSecrets, credentials, PII, or proprietary code in scope.env, DB connections, internal repos
Untrusted inputContent the agent did not author and can't fully trustPR comments, issues, fetched pages, dependencies
External egressAbility to send data outside the sandboxHTTP tools, MCP servers with outbound calls
Risk requires all three legs at once. Untrusted input carries the attacker's instruction; private data is the payload; egress is the exit. Remove any one and the exfiltration path closes.

2 Audit per path, not per agent

The unit of analysis is the execution path, not the agent as a whole. One agent can run several paths; only the path that holds three "Yes" values is unsafe.

Execution pathPrivate?Untrusted?Egress?Safe?
Code-review agent (no network)YesYesNoYes
Research agent (web, no secrets)NoYesYesYes
Deploy agent with env vars + repo configYesYesYesNo
Internal codegen (controlled input)YesNoYesYes

A single path with three "Yes" values demands an architectural fix, not a better prompt.

3 Remove egress first

Which leg to cut depends on the task — but for coding agents the cheapest cut is almost always egress, because most coding tasks need no outbound network at all.

# Default-deny outbound — the model cannot override this docker run --network none agent-image

This is a deterministic control: it runs below the model, so no injected instruction can talk its way past it. OpenAI ships the same idea as Lockdown Mode — outbound requests capped with no AI in the decision loop.

A real chain: the poisoned dependency

NVIDIA documented it end-to-end: an agent reads a GitHub issue naming a malicious pip package (untrusted input), installs it (egress), and the package exfiltrates env vars (private data). Three legs, one path. The fix isn't a smarter model — it's removing the egress leg.

Removing a leg migrates risk — it doesn't erase it

The trifecta is a structural heuristic, not a guarantee. Tokenizing PII shifts the attack to the token resolver; sandboxing egress shifts it to sandbox-escape. Each removed leg creates a new high-value target that must itself be hardened. And "read-only egress" or "partially tokenized data" sit between present and absent — binary Yes/No columns can produce false confidence.

↪ Your win: a two-minute trifecta audit

Retrieval practice — recall, don't peek

Question 1The three legs of the lethal trifecta are…

Question 2To close the exfiltration path, you need to…

Question 3The correct unit of a trifecta audit is the…

Question 4For most coding agents, the cheapest leg to remove is…

Question 5Removing a leg, e.g. tokenizing PII, mainly…

Ask me anything. Want to run a trifecta audit on an agent you're building, or see how the six design patterns (Dual LLM, Plan-Then-Execute, CaMeL) each map to a specific leg? Next in Part 1: The Provenance-Blind Model — why prompt injection works at all.
✎ Feedback