Security · ~9 min
Eighteen lessons, one reflex: name the legs, locate the boundary, fix it below the model. Here's the full lookup table and the worked case.
Everything in this course reduces to one loop: which legs of the trifecta are on this path, where is the trust boundary, and what deterministic control sits below the model to enforce it? The model is never the gate.
| Symptom you observe | Mitigation | Lesson |
|---|---|---|
| Agent has private data, untrusted input, and egress on one path | Remove a leg — egress first for coding agents | 1 |
| Agent obeys instructions buried in a fetched page or repo file | Treat all retrieved bytes as untrusted; move defense below the model | 2 |
| A secret could appear in a prompt, log, or generated file | Inject as env var / wrapper script; audit the env per session | 3 |
| Agent could write a startup script or exfiltrate a read file | Dual-boundary sandbox at the OS level — both walls at once | 4 |
| Injected agent could connect to an attacker URL | Harness-enforced egress policy; denies override allow wildcards | 5 |
| Web page picks the agent's next consequential action | Plan-then-execute; quarantine extracted values from the planner | 6 |
| You're relying on one control to hold under attack | Layer independent mechanisms; prefer schema filtering over runtime rejection | 7 |
| One agent holds far more permission than its task needs | Least privilege; decompose into narrow chains; add a kill path | 8 |
| Agent overreaches on a benign task (deletes the unasked file) | Pick an ask-to-continue framework or deterministic deny-list, not a better model | 9 |
| Agent runs untrusted code on a shared, multi-tenant host | Choose a microVM runtime; patch the VMM as hard as the guest | 10 |
| Many MCP tools, each with its own ad-hoc authorisation | One runtime control plane; deterministic policy with argument inspection | 11 |
| A long-lived static API key sits on the agent runtime | Workload identity federation; pin the rule's match block narrowly | 12 |
| Agent installs a package name it invented | Lockfile-enforced install / mirror; gate install authority, not the model | 13 |
| Agent fetches a URL built from untrusted content | Public-web index gate; disable redirect-following; or strict egress | 14 |
| Agent output is executed or rendered downstream unchecked | Per-sink validation; parameterise, encode, allowlist — let the schema validate | 15 |
| An untrusted read plants a payload in long-term memory | User-only memory writes; deny tool-return sources; egress allow-list | 16 |
| Retrieval returns another tenant's chunk from a shared index | Gate the search space — pre-filter ABAC, post-filter the top-K | 17 |
| A crafted prompt or stolen key drains the bill unseen | Stack the five bounds; tuple-key on (user, repo, model); counters not classifiers | 18 |
The default "AI reviewer in GitHub Actions" is the trifecta by construction: it ingests PR titles, issue bodies,
and comments (untrusted, attacker-writable on a public repo) while the same runtime holds
GITHUB_TOKEN and pipeline secrets (private data) plus write tools
(egress). The GitInject study found every tested provider susceptible in default config —
one variant rated CVSS 9.4 after a malicious PR title dumped env to a public comment.
Apply the table. Split the agent (Lesson 8 decomposition): a read-only reviewer ingests untrusted content; a separately-credentialed actor receives only a structured, filtered allow-list of operations. The trifecta breaks at the workflow level — not the prompt.
Two-agent isolation dropped measured attack success to 0.31%; full read/write separation reached 0%. Architectural separation beats prompt mitigation because the model is not the gate.
The hardening narrows for private repos with vetted contributors (untrusted leg closes at
access control), pure read-only agents (egress leg closes at the tool allowlist), or
no production secrets in the runtime. Where two-agent separation is impractical, defense-in-depth
— output secret scanning + a human merge gate + a scoped GITHUB_TOKEN — covers the realistic surface.
Mixed review — the whole course
Question 1 · from Lesson 1An execution path is unsafe only when it holds…
Question 2 · from Lesson 9An agent that deletes an unasked file on a benign task is best fixed by…
Question 3 · from Lesson 11A runtime control plane stops an injected tool call because the decision is…
Question 4 · from Lesson 17A shared-index retriever leaks across tenants because it ranks by…
Question 5 · spaced recall from Lesson 18Denial-of-wallet is dangerous specifically because…