Reference · Canonical Language

Security — Glossary

The working vocabulary for the Security course. Once a term lives here, every lesson uses this word for it. Grows as we go.

The Threat Model

Lethal trifecta
Three capabilities that are harmless alone but exploitable together on one execution path: private data, untrusted input, and external egress. Remove any one leg and the exfiltration path closes.
Avoid: "the agent is risky" — name which legs are present on which path.
Source: lethal-trifecta-threat-model.md · Willison 2025
Execution path
The unit of a trifecta audit. One agent runs several paths; only a path holding all three legs is unsafe. Audit per path, not per agent — a three-"Yes" path demands an architectural fix, not a prompt revision.
Avoid: auditing "the agent" as one undifferentiated unit.
Source: lethal-trifecta-threat-model.md
Prompt injection
An attack where malicious instructions in content the agent reads are followed as if they came from the user or system prompt. Direct = the user types it; indirect = it rides in on retrieved content.
Avoid: "jailbreak" — injection is about provenance confusion, not safety-training bypass.
Source: prompt-injection-threat-model.md · OpenAI
Indirect injection · indirect prompt injection
Injection where the payload arrives through content the agent retrieves itself — a page, repo file, MCP response, PDF, or dependency metadata. The surface most developers forget, because clean-environment testing never exercises it.
Source: indirect-injection-discovery.md
Provenance-blindness
The reason injection works: transformer attention processes every token uniformly, with no architectural channel marking origin. Injected instructions share the same token space as legitimate ones and carry no origin metadata.
Avoid: "the model got confused" — the model has no information that could distinguish the sources.
Source: prompt-injection-threat-model.md
Rules-file injection · rules file backdoor
A repository-based vector: malicious instructions in auto-processed config files (CLAUDE.md, .cursorrules, .github/copilot-instructions.md) that bypass user review when a repo opens.
Source: indirect-injection-discovery.md · Pillar Security

Containing the Damage

Credential injection
Supplying secrets as environment variables set before the agent starts — inherited by child processes but never transmitted as text through a tool call. The default secrets pattern; never paste a key into a prompt.
Avoid: "store the key safely" — the point is the key never enters context at all.
Source: secrets-management-for-agents.md
Wrapper script
A script that consumes a credential internally and returns only its output. The agent calls it by name with intent; the raw secret never appears in the tool input or context window.
Source: secrets-management-for-agents.md
Dual-boundary sandboxing
Enforcing filesystem and network isolation simultaneously, at the OS level. Filesystem-only allows network exfiltration; network-only allows filesystem-based privilege escalation. Neither boundary alone contains an agent.
Avoid: "sandboxed" when only one boundary is enforced — that's a leak waiting to happen.
Source: dual-boundary-sandboxing.md
Approval fatigue
The failure mode of granular per-action prompts: users click "approve" without reading — the illusion of oversight with none of the substance. A dual-boundary sandbox replaces most prompts with a hard safe zone.
Avoid: treating an approval gate as oversight when users habituate to it.
Source: dual-boundary-sandboxing.md
Blast radius · least privilege, permission scoping
The bound on damage a compromised agent can do, set by the permissions you grant. Every unneeded permission is attack surface. Scope tools, files, mode, and repo access; decompose broad agents into narrow chains.
Source: blast-radius-containment.md

Architecting the Defense

Egress policy · domain allow/deny
A harness-enforced domain allow/deny list that rejects a connection before it leaves the process, regardless of what the model produced. Moves egress out of the model's trust boundary. Denies must override allow wildcards.
Avoid: letting the model decide which URLs to reach — injection defeats that immediately.
Source: agent-network-egress-policy.md
Allow-first vs deny-first
Two egress postures. Allow-first + default-deny: block everything not explicitly allowed (regulated, high-sensitivity, cloud runners). Deny-first: allow everything not explicitly blocked (interactive dev loops).
Source: agent-network-egress-policy.md
Matcher trust boundary
Once the egress check lives in the harness, the matcher is the boundary — one parser bug bypasses every policy (e.g. a SOCKS5 null-byte that passes endsWith() but truncates in getaddrinfo()). Keep a lower-layer enforcement point that doesn't trust the parser.
Source: agent-network-egress-policy.md
Plan-then-execute
Committing to a typed, task-specific program before any untrusted page is observed. Page content can populate values inside the fixed graph but cannot synthesize new actions or redefine the task. The default for web agents.
Avoid: ReAct for web agents taking consequential actions over multi-party content.
Source: plan-then-execute-web-agents.md · Piet et al. 2026
Control/data-flow separation · CaMeL, quarantined channel
The structural family behind plan-then-execute: a privileged channel carries control flow from the trusted user task; a quarantined channel handles untrusted content with no authority to alter what runs.
Source: camel-control-data-flow-injection.md · Debenedetti et al. 2025
GitInject · CI/CD prompt injection
The anti-pattern of an AI reviewer in CI/CD that ingests PR/issue text (untrusted) while holding repo-write tokens and secrets (private data + egress) in one runtime — the lethal trifecta on every run. Fix: two-runtime separation.
Source: ai-agents-in-ci-cd-with-elevated-permissions.md · Isbarov et al. 2026

Layering the Defense

Defense in depth · layered safety
Multiple independent safety mechanisms, each at a different level of the stack, so the failure of any one does not compromise the others. Assumes every single layer eventually gets bypassed under adaptive attack.
Avoid: trusting one "strong" control — any individual defense can be circumvented.
Source: defense-in-depth-agent-safety.md · Bui 2026
Schema-level tool filtering
Removing a tool from the model's schema so it never sees the tool exists — stronger than runtime rejection, because the model cannot form the intent to call a tool it cannot see. The attack surface shrinks before inference.
Source: defense-in-depth-agent-safety.md
Overeager action · out-of-scope action
An operation outside the user's authorised scope on a benign task — not injection, not escape, an authorisation failure. Driven more by the permission framework than the base model: identical weights span 1.1%–27.7%.
Avoid: "tune the model" — pick the framework (ask-to-continue or deny-list) first.
Source: permission-framework-over-model.md · Qu et al. 2026
Sandbox runtime · container, microVM, OS-level isolator
The runtime that enforces a sandbox, traded along isolation-vs-startup-cost. Containers: fast, kernel-shared. MicroVMs: hardware-isolated, ~125 ms boot — the pick for untrusted/multi-tenant code. OS-level isolators (bubblewrap, Seatbelt): no daemon, fastest, weakest against escape.
Source: sandbox-runtime-comparison.md

The Identity & Supply Surface

Runtime control plane · MCP policy gateway
A single policy evaluation point between agent and tool that intercepts every call — identity, tool name, arguments, rate limits — and forwards or denies deterministically. ~27% prompt-only violations vs 0% at the app layer. Sees only traffic that traverses it; off-protocol actions bypass it.
Avoid: tool-name policies without argument inspection — a pre-approved tool with hostile args is still RCE.
Source: mcp-runtime-control-plane.md
Workload identity federation · WIF, keyless auth
Replacing a long-lived API key with a short-lived token minted from a signed OIDC JWT the runtime already holds. The federation rule's claim-match block becomes the security boundary; token lifetime is capped so it can't outlive the upstream identity. Does not close the workload-attestation gap.
Avoid: leaving ANTHROPIC_API_KEY="" — empty still shadows federation; unset it.
Source: workload-identity-federation-for-agents.md
Slopsquatting · package hallucination
A supply-chain attack: an LLM recommends a nonexistent package, an attacker pre-registers the name, the agent installs malware. 43% of hallucinated names recur across re-runs, making them enumerable; 48.6% sit far from any real name, so typosquat detectors miss them. Defense is install authority, not model behavior.
Source: slopsquatting-hallucinated-package-names.md · Spracklen et al. 2025
URL exfiltration · query-string exfiltration
Leaking private data in the query string of a fetched URL — the request itself is the channel, before any response is read. Domain allow-lists ask the wrong question; a public-web index gate asks whether the URL could encode user-specific data. Covers query strings only, not DNS/timing/header channels.
Source: url-exfiltration-guard.md

The Output & Data Surface

Improper output handling · OWASP LLM05, downstream sink
Agent output executed, rendered, or interpreted by a downstream sink — shell, SQL, HTML renderer, file path, package manager — without per-sink validation. Trust does not transfer through a string boundary; treat the model as any other user and validate its output at each sink. The controls are old; the applicability surface is new.
Avoid: conflating it with LLM06 Excessive Agency — that bounds the agent's actions; LLM05 validates its output.
Source: improper-output-handling-downstream-sinks.md · OWASP LLM05:2025
Memory poisoning · Trojan Hippo, dormant payload
A persistence attack: one untrusted read plants a dormant instruction in long-term memory that activates sessions later when the user raises a sensitive topic, exfiltrating data. Composes the lethal trifecta across sessions, so a per-session audit passes each half and misses the pivot. Fix: user-only memory writes; deny tool-return sources.
Avoid: trusting single-session injection resistance to transfer — write-time review runs in a context lacking the trigger.
Source: trojan-hippo-memory-attack.md · Das et al. 2026
Relevance-authorization gap · multitenant RAG leak, ABAC-gated retrieval
Retrieval ranks by relevance, which carries no signal about who is asking; in a shared index the top-scoring chunk for one tenant can belong to another. Fix the search space, not the output: pre-filter candidates by ABAC at the index, then post-filter the top-K to catch ANN bypass. Authorize the record, not just the tool call.
Avoid: application-layer post-filtering alone — the vector DB already spent its top-K budget on forbidden chunks.
Source: multitenant-rag-authorization-gap.md · Arceo & Narsing 2026

The Resource Surface

Unbounded consumption · OWASP LLM10, denial-of-wallet
Resource and cost exhaustion as a threat. An LLM call's cost is variable and attacker-influenceable, priced linearly, so requests-per-second does not bind dollars-per-second. One control surface serves two owners — DoS (availability) and denial-of-wallet (finance) — and the bill drains while availability metrics stay healthy.
Avoid: a single fixed RPS limit — it misses low-and-slow wallet attacks and breaks legitimate bursty workflows.
Source: unbounded-consumption-resource-bounds.md · OWASP LLM10:2025
The five bounds
The complementary controls for unbounded consumption, each keying on a different unit of cost: per-call token cap (output size), per-task iteration cap (loop depth), fan-out concurrency cap (parallel breadth), cost-velocity breaker (rolling $/min), and per-day dollar budget (absolute ceiling). Tuple-key on (user, repo, model); keep deterministic counters in the enforcement path, classifiers in detection only.
Source: unbounded-consumption-resource-bounds.md