Symptom to Mitigation

Eighteen lessons, one reflex: name the legs, locate the boundary, fix it below the model. Here's the full lookup table and the worked case.

Why this, for you: in a real incident you won't recite a threat model — you'll see a symptom and need the move. This capstone is the diagnostic table to keep, plus one end-to-end case (a CI/CD reviewer) that chains every lesson, and a mixed-review quiz across the whole course.

Everything in this course reduces to one loop: which legs of the trifecta are on this path, where is the trust boundary, and what deterministic control sits below the model to enforce it? The model is never the gate.

1 The decision table

Symptom you observe	Mitigation	Lesson
Agent has private data, untrusted input, and egress on one path	Remove a leg — egress first for coding agents	1
Agent obeys instructions buried in a fetched page or repo file	Treat all retrieved bytes as untrusted; move defense below the model	2
A secret could appear in a prompt, log, or generated file	Inject as env var / wrapper script; audit the env per session	3
Agent could write a startup script or exfiltrate a read file	Dual-boundary sandbox at the OS level — both walls at once	4
Injected agent could connect to an attacker URL	Harness-enforced egress policy; denies override allow wildcards	5
Web page picks the agent's next consequential action	Plan-then-execute; quarantine extracted values from the planner	6
You're relying on one control to hold under attack	Layer independent mechanisms; prefer schema filtering over runtime rejection	7
One agent holds far more permission than its task needs	Least privilege; decompose into narrow chains; add a kill path	8
Agent overreaches on a benign task (deletes the unasked file)	Pick an ask-to-continue framework or deterministic deny-list, not a better model	9
Agent runs untrusted code on a shared, multi-tenant host	Choose a microVM runtime; patch the VMM as hard as the guest	10
Many MCP tools, each with its own ad-hoc authorisation	One runtime control plane; deterministic policy with argument inspection	11
A long-lived static API key sits on the agent runtime	Workload identity federation; pin the rule's match block narrowly	12
Agent installs a package name it invented	Lockfile-enforced install / mirror; gate install authority, not the model	13
Agent fetches a URL built from untrusted content	Public-web index gate; disable redirect-following; or strict egress	14
Agent output is executed or rendered downstream unchecked	Per-sink validation; parameterise, encode, allowlist — let the schema validate	15
An untrusted read plants a payload in long-term memory	User-only memory writes; deny tool-return sources; egress allow-list	16
Retrieval returns another tenant's chunk from a shared index	Gate the search space — pre-filter ABAC, post-filter the top-K	17
A crafted prompt or stolen key drains the bill unseen	Stack the five bounds; tuple-key on (user, repo, model); counters not classifiers	18

When two controls compete, prefer the deterministic one that runs below the model. A prompt instruction is a tiebreaker, never the gate.

2 Worked case: the AI reviewer in CI/CD

The default "AI reviewer in GitHub Actions" is the trifecta by construction: it ingests PR titles, issue bodies, and comments (untrusted, attacker-writable on a public repo) while the same runtime holds GITHUB_TOKEN and pipeline secrets (private data) plus write tools (egress). The GitInject study found every tested provider susceptible in default config — one variant rated CVSS 9.4 after a malicious PR title dumped env to a public comment.

# BEFORE — one runtime: untrusted input + secrets + write tools on: pull_request_target # runs with secrets on fork PRs permissions: write-all - uses: claude-code-action@v1 with: { tools: gh,git,filesystem } # reads PR title, then can post + commit

Apply the table. Split the agent (Lesson 8 decomposition): a read-only reviewer ingests untrusted content; a separately-credentialed actor receives only a structured, filtered allow-list of operations. The trifecta breaks at the workflow level — not the prompt.

# AFTER — two runtimes: reviewer never touches credentials reviewer: on: pull_request # no secrets permissions: { contents: read, pull-requests: read } with: { tools: filesystem, output: findings.json } # no gh, no git actor: needs: reviewer permissions: { pull-requests: write } # post-only; reads no attacker bytes

Two-agent isolation dropped measured attack success to 0.31%; full read/write separation reached 0%. Architectural separation beats prompt mitigation because the model is not the gate.

Proportional, not blanket

The hardening narrows for private repos with vetted contributors (untrusted leg closes at access control), pure read-only agents (egress leg closes at the tool allowlist), or no production secrets in the runtime. Where two-agent separation is impractical, defense-in-depth — output secret scanning + a human merge gate + a scoped GITHUB_TOKEN — covers the realistic surface.

↪ Your win: a reflex you can run in any incident

Name the legs on the affected execution path — private data, untrusted input, egress.
Locate the trust boundary and the symptom in the table above.
Apply the deterministic control that runs below the model; demote prompt rules to last.
Re-audit the new target the fix created — resolver, proxy, matcher, sandbox edge.
Right-size it — closed pipelines and read-only agents need less; weigh cost against surface.

Mixed review — the whole course

Question 1 · from Lesson 1An execution path is unsafe only when it holds…

Question 2 · from Lesson 9An agent that deletes an unasked file on a benign task is best fixed by…

Question 3 · from Lesson 11A runtime control plane stops an injected tool call because the decision is…

Question 4 · from Lesson 17A shared-index retriever leaks across tenants because it ranks by…

Question 5 · spaced recall from Lesson 18Denial-of-wallet is dangerous specifically because…

Ask me anything. Want to run the symptom → mitigation table against an agent you're shipping, or walk a real incident from symptom to deterministic control? You've finished the Security course — keep the Glossary as your canonical language.

Symptom to Mitigation

1 The decision table

2 Worked case: the AI reviewer in CI/CD

Proportional, not blanket

↪ Your win: a reflex you can run in any incident

Go deeper