Capstone

Security · ~9 min

Symptom to Mitigation

Eighteen lessons, one reflex: name the legs, locate the boundary, fix it below the model. Here's the full lookup table and the worked case.

Why this, for you: in a real incident you won't recite a threat model — you'll see a symptom and need the move. This capstone is the diagnostic table to keep, plus one end-to-end case (a CI/CD reviewer) that chains every lesson, and a mixed-review quiz across the whole course.

Everything in this course reduces to one loop: which legs of the trifecta are on this path, where is the trust boundary, and what deterministic control sits below the model to enforce it? The model is never the gate.

1 The decision table

Symptom you observeMitigationLesson
Agent has private data, untrusted input, and egress on one pathRemove a leg — egress first for coding agents1
Agent obeys instructions buried in a fetched page or repo fileTreat all retrieved bytes as untrusted; move defense below the model2
A secret could appear in a prompt, log, or generated fileInject as env var / wrapper script; audit the env per session3
Agent could write a startup script or exfiltrate a read fileDual-boundary sandbox at the OS level — both walls at once4
Injected agent could connect to an attacker URLHarness-enforced egress policy; denies override allow wildcards5
Web page picks the agent's next consequential actionPlan-then-execute; quarantine extracted values from the planner6
You're relying on one control to hold under attackLayer independent mechanisms; prefer schema filtering over runtime rejection7
One agent holds far more permission than its task needsLeast privilege; decompose into narrow chains; add a kill path8
Agent overreaches on a benign task (deletes the unasked file)Pick an ask-to-continue framework or deterministic deny-list, not a better model9
Agent runs untrusted code on a shared, multi-tenant hostChoose a microVM runtime; patch the VMM as hard as the guest10
Many MCP tools, each with its own ad-hoc authorisationOne runtime control plane; deterministic policy with argument inspection11
A long-lived static API key sits on the agent runtimeWorkload identity federation; pin the rule's match block narrowly12
Agent installs a package name it inventedLockfile-enforced install / mirror; gate install authority, not the model13
Agent fetches a URL built from untrusted contentPublic-web index gate; disable redirect-following; or strict egress14
Agent output is executed or rendered downstream uncheckedPer-sink validation; parameterise, encode, allowlist — let the schema validate15
An untrusted read plants a payload in long-term memoryUser-only memory writes; deny tool-return sources; egress allow-list16
Retrieval returns another tenant's chunk from a shared indexGate the search space — pre-filter ABAC, post-filter the top-K17
A crafted prompt or stolen key drains the bill unseenStack the five bounds; tuple-key on (user, repo, model); counters not classifiers18
When two controls compete, prefer the deterministic one that runs below the model. A prompt instruction is a tiebreaker, never the gate.

2 Worked case: the AI reviewer in CI/CD

The default "AI reviewer in GitHub Actions" is the trifecta by construction: it ingests PR titles, issue bodies, and comments (untrusted, attacker-writable on a public repo) while the same runtime holds GITHUB_TOKEN and pipeline secrets (private data) plus write tools (egress). The GitInject study found every tested provider susceptible in default config — one variant rated CVSS 9.4 after a malicious PR title dumped env to a public comment.

# BEFORE — one runtime: untrusted input + secrets + write tools on: pull_request_target # runs with secrets on fork PRs permissions: write-all - uses: claude-code-action@v1 with: { tools: gh,git,filesystem } # reads PR title, then can post + commit

Apply the table. Split the agent (Lesson 8 decomposition): a read-only reviewer ingests untrusted content; a separately-credentialed actor receives only a structured, filtered allow-list of operations. The trifecta breaks at the workflow level — not the prompt.

# AFTER — two runtimes: reviewer never touches credentials reviewer: on: pull_request # no secrets permissions: { contents: read, pull-requests: read } with: { tools: filesystem, output: findings.json } # no gh, no git actor: needs: reviewer permissions: { pull-requests: write } # post-only; reads no attacker bytes

Two-agent isolation dropped measured attack success to 0.31%; full read/write separation reached 0%. Architectural separation beats prompt mitigation because the model is not the gate.

Proportional, not blanket

The hardening narrows for private repos with vetted contributors (untrusted leg closes at access control), pure read-only agents (egress leg closes at the tool allowlist), or no production secrets in the runtime. Where two-agent separation is impractical, defense-in-depth — output secret scanning + a human merge gate + a scoped GITHUB_TOKEN — covers the realistic surface.

↪ Your win: a reflex you can run in any incident

Mixed review — the whole course

Question 1 · from Lesson 1An execution path is unsafe only when it holds…

Question 2 · from Lesson 9An agent that deletes an unasked file on a benign task is best fixed by…

Question 3 · from Lesson 11A runtime control plane stops an injected tool call because the decision is…

Question 4 · from Lesson 17A shared-index retriever leaks across tenants because it ranks by…

Question 5 · spaced recall from Lesson 18Denial-of-wallet is dangerous specifically because…

Ask me anything. Want to run the symptom → mitigation table against an agent you're shipping, or walk a real incident from symptom to deterministic control? You've finished the Security course — keep the Glossary as your canonical language.
✎ Feedback