Security · ~7 min
Teams argue about which model is "safer." The data says the harness around it matters far more — the same model swings from 1.1% to 27.7% overeager actions depending on the permission framework.
Overeager actions are operations outside the user's authorised scope on benign tasks — deleting an unrelated file, wiping a stale credentials backup, rewriting config nobody mentioned. OverEager-Bench measured 500 such scenarios across four products and six models. The result: the harness type, not the base model, dominates the rate.
| Framework | Permission model | Overeager rate |
|---|---|---|
| Claude Code | Permissive default | 11.8–27.7% |
| Gemini CLI | Permissive default | 10.0–16.9% |
| Codex CLI | Permissive default | 5.4–13.8% |
| OpenHands | Ask-to-continue | 0.2–4.5% |
LLMs encode authorisation as text patterns, not durable intent. Given a "do not delete files outside the working
directory" rule, the model pattern-matches candidate actions against the literal phrase — and a file named
auth-credentials.bak at repo root pattern-matches as auth-related and gets deleted anyway. Stripping the
scope block raises overeager rates 11.9–17.2 points; the model is doing fuzzy matching, not reasoning about consent.
Ask-to-continue doesn't improve the model's inference. It interposes a checkpoint so a proposal can't become an effect without a separate consent event. The model still misjudges scope; the harness denies it the chance to act on the misjudgment.
Anthropic's classifier-based Auto Mode runs chain-of-thought over each tool call — and still leaves 17% of real overeager actions undetected. The classifier usually "correctly identifies the danger" but can't decide whether a consent statement covers a specific blast radius. Model-layer inference has a ceiling; framework-layer gating sits below it.
This finding holds only when it matters: native filesystem with shared credentials, real production scope reachable, benign under-specified tasks, and approvals actually honoured. A hermetic sandbox, read-only tools, or a deterministic narrow allowlist makes permission-mode choice second-order — for those, bound the blast radius (Lesson 8) and accept the rate. Absolute numbers come from one benchmark whose authors flag validity concerns; the relative ranking is the robust result.
Retrieval practice — recall, don't peek
Question 1An "overeager action" is best classified as a…
Question 2Across OverEager-Bench, the largest driver of the overeager rate is the…
Question 3Ask-to-continue lowers overeager actions by…
Question 4Anthropic's classifier-based Auto Mode still leaves roughly…
Question 5 · spaced recall from Lesson 8The tools field in agent frontmatter is enforced by…