Part 4 · Layering the Defense

Security · ~7 min

Bound the Blast Radius

When a layer fails — and one will — the damage is capped by the permissions you granted. Every permission the task doesn't need is attack surface you chose to keep.

Why this, for you: Lesson 7 said assume every layer fails. This lesson answers the next question: when it does, how bad is it? Least privilege is the dial that sets the ceiling on damage — and unlike a prompt rule, it's enforced by the runtime, so even a successful injection can't exceed it.

Anthropic frames the whole trade-off as risk = likelihood × damage. Defense-in-depth lowers likelihood; permission scoping lowers damage. The damage a compromised agent can do is bounded by the permissions you grant it — and that bound is structural, because tool access is filtered at the runtime layer before the model ever sees a request.

1 Four dimensions to scope per agent

"Least privilege" is not one knob. Scope each of these independently to the task definition:

DimensionThe questionExample bound
Tool accessWhich tools can it invoke?Research agent: Read, not Write or Bash
File scopeWhich files can it touch?Worktree limited to docs/, never .github/
Permission modeWhat human interaction?acceptEdits vs ask-on-first-use
Repo accessWhat can it push?Copilot pushes only to copilot/ branches, never main
Tool restrictions in agent frontmatter are enforced by the runtime, not the model — the tools field controls what the runtime exposes, not what the model requests. A successful injection cannot invoke a tool the runtime never made available.

2 Decompose, don't broaden

Rather than one agent with broad permissions, chain narrow-scoped agents — each holding only the permissions for one operation. A documentation pipeline splits cleanly:

# Three chained agents — each injection bounded to its operation research: tools=[Read, WebFetch] permissions.allow=[] # writes nothing draft: tools=[Read, Write] allow=["Write(docs/drafts/**)"] # no network review: tools=[Read, Bash] allow=["Bash(gh pr comment*)"] # no push, no write

A prompt injection into the research agent cannot write files; an injection into the draft agent cannot push to remote. The broad agent's worst case is replaced by three narrow ones.

3 Bounded radius is not bounded duration

Scoping caps per-action damage. It does not cap time-integrated damage on its own.

60% of orgs can't stop a misbehaving agent

A Kiteworks 2026 report found 60% of organizations cannot terminate a misbehaving agent. A narrowly-scoped agent still accumulates damage between detection and shutdown if there's no kill switch. Pair permission scoping with a termination path the agent cannot block — a supervisor heartbeat, a harness circuit breaker, or an external orchestrator timeout — so bounded radius and bounded duration hold together.

Before deployment, run the audit: What's the broadest action this agent could take? If injected, what's the worst-case outcome? Which permissions exist for convenience, not necessity? Remove every permission you can't justify from the task definition.

↪ Your win: cap the damage before it happens

Retrieval practice — recall, don't peek

Question 1Permission scoping primarily lowers which term of risk = likelihood × damage?

Question 2The tools field in agent frontmatter is enforced by…

Question 3Decomposing one broad agent into a narrow chain reduces…

Question 4Scoping bounds per-action damage but not duration, so pair it with…

Question 5 · spaced recall from Lesson 7Schema-level tool filtering is stronger than runtime rejection because…

Ask me anything. Want to write least-privilege profiles for a pipeline you're building, or design a kill switch the agent can't disable? Next in Part 4: The Framework Is the Knob — why the permission framework moves overeager-action rates more than the model does.
✎ Feedback