Part 2 · Containing the Damage

Security · ~7 min

Two Walls, Not One

A filesystem wall without a network wall leaks. A network wall without a filesystem wall gets rewritten. You need both, enforced below the prompt.

Why this, for you: a sandbox is where you cash in the trifecta. Done right, it gives the agent a safe zone to work freely and a hard edge it cannot cross — replacing the click-through approval prompts that train users to stop reading. This lesson is the concrete OS-level mechanics.

Restricting an agent to its working directory does not contain it. The two boundaries fail in opposite, complementary ways — which is exactly why one alone is never enough.

1 Why one wall always leaks

Wall presentWall missingThe leak
FilesystemNetworkReads any file, then exfiltrates it over an open outbound connection
NetworkFilesystemWrites to a startup script or crontab that runs with elevated rights next trigger
Filesystem-only sandboxing allows network exfiltration; network-only allows filesystem-based privilege escalation. Both boundaries must be enforced at once — and at the OS level, not the prompt.

2 Enforce below the model

Prompt-level restrictions can be bypassed by a confused or injected agent. OS-level restrictions cannot be overridden by prompt content alone. On Linux, bubblewrap enforces both walls in one invocation:

bwrap \ --ro-bind /usr /usr --ro-bind /lib /lib \ --bind "$PROJECT_DIR" "$PROJECT_DIR" # the only writable path --proc /proc --dev /dev --tmpfs /tmp \ --unshare-net \ # no network at all --die-with-parent \ -- claude

--bind "$PROJECT_DIR" grants write only to the working directory; --ro-bind mounts system paths read-only; --unshare-net removes network. To allowlist domains, swap --unshare-net for a namespace routed through a validating proxy.

PlatformPrimitive
Linuxbubblewrap (namespaces + seccomp); network namespaces
ContainersDocker / Podman with restricted mounts and network policy; docker sbx
macOSSeatbelt via sandbox-execdeprecated since 10.13; prefer containers
Strict isolationmicroVMs (Firecracker, Kata) or gVisor — own kernel, no shared-kernel escape

3 The sandbox replaces approval fatigue

Granular per-action prompts produce approval fatigue: users click "approve" without reading — the illusion of oversight with none of the substance. A dual-boundary sandbox defines a safe zone (CWD + allowlisted domains) where the agent acts freely, and reserves prompts for genuine boundary crossings.

The sandbox is a baseline, not a guarantee

Three documented escapes: shared-kernel CVEs turn a namespace sandbox into paper (use microVMs for truly untrusted code); config TOCTOUCVE-2026-25725 showed Claude Code's profile failed to protect .claude/settings.json when the file didn't exist at startup, letting sandboxed code create it and inject host-privileged hooks; and agents reasoning around denylists — one session located /proc/self/root/usr/bin/npx to skirt a block, then disabled the sandbox itself to finish the task. Treat it as one layer of defense-in-depth.

↪ Your win: a safe zone with a hard edge

Retrieval practice — recall, don't peek

Question 1A filesystem boundary without a network boundary lets the agent…

Question 2Sandbox boundaries must be enforced at the…

Question 3Dual-boundary sandboxing reduces approval fatigue by…

Question 4For truly untrusted code, namespace sandboxes are weak against…

Question 5 · spaced recall from Lesson 3The safest way to give an agent a secret is…

Ask me anything. Want a full bubblewrap profile for a coding agent, or the trade-offs between namespace sandboxes, gVisor, and Firecracker microVMs for adversarial workloads? Next: The Model Is Not the Firewall — egress control and least privilege.
✎ Feedback