Reference · Canonical Language

Agentic Workflows — Glossary

The working vocabulary for this course. Once a term lives here, every lesson uses this word for it. Grows as we go.

The Inner Loop

Plan-first loop · summarize-correct-plan-implement: For non-trivial work, having the agent describe the subsystem read-only, correcting its understanding, co-creating a written plan, and only then implementing — comparing the diff against the plan. Adds the explicit correction step and the plan-as-file mechanism on top of research-plan-implement.; Avoid: applying it to one-sentence changes — the checkpoint is overhead when assumptions are certain.; Source: plan-first-loop · OpenAI — Sora with Codex
Research-Plan-Implement · RPI; QRSPI extended: Splitting agent work into three phases — gather context, decide the approach, execute — with a legitimate replan gate when implementation invalidates the plan. The cost ladder (re-read → rewrite → revert) makes the ordering non-negotiable. QRSPI adds Questioning before Research and Structure before Implement for high-stakes work.; Source: research-plan-implement · Lavaee — RPI to QRSPI
Reasoning sandwich · reasoning budget allocation: Allocating maximum reasoning compute to planning and verification and lower compute to execution, rather than a uniform level. Scored 66.5% in LangChain's harness-engineering benchmark; implementation needs disciplined execution, not creative problem-solving.; Avoid: "more thinking is always better" — uniform max reasoning on execution wastes budget and risks timeouts.; Source: LangChain — Harness Engineering
Implement-fail-fix loop: The failure mode plan-first prevents: the agent implements on a wrong assumption, patches the symptom, hits the next problem, and burns context on incremental repair. Detectable by repeated revisions of the same function or by using error output as the primary source of codebase knowledge.; Source: plan-first-loop

Where the Human Stands

Why loop / how loop: The two nested loops of software delivery. The why loop (idea → working software → evaluate) is always human-owned. The how loop (specs, code, tests → working software) is increasingly agent-owned and itself nests: feature delivery wraps story breakdown wraps code generation.; Source: humans-agents-development-loops · Morris
On the loop · vs. in / outside the loop: The human positioning that engineers the harness governing each how-loop level rather than inspecting each artefact. The defining test: when output is wrong, "in the loop" fixes the artefact; "on the loop" fixes the harness that produced it — and that fix applies to every future run.; Avoid: encoding decisions into the harness — it lowers the ceiling; encode verifiable constraints instead.; Source: humans-agents-development-loops · human-in-the-loop
Reversibility-based gating · progressive trust: Placing human approval gates before irreversible or public-impact actions and skipping them for reversible execution steps, using undo cost as the primary signal. Progressive trust then migrates a proven workflow from in-the-loop to on-the-loop to out-of-the-loop. The gate reviews the decision ("is this right?"), never execution ("is the syntax valid?").; Source: human-in-the-loop

Scaling Out

Factory model · vs. assistant model: Orchestrating multiple parallel agent sessions whose primary feedback is automated systems (tests, CI, linters), with the human reviewing asynchronously — as opposed to the assistant model where one human watches one agent and is the feedback loop. Requires automated feedback, monitoring, task isolation, and skill libraries as a precondition, not just the decision to run parallel agents.; Avoid: fanning out inherently sequential work — parallelism removes sequential bottlenecks, nothing else.; Source: factory-over-assistant · parallel-agent-sessions
Review bandwidth bottleneck · Brooks's Law for agents: The constraint that emerges under parallelism: agents run concurrently without attention cost to each other, but the human context-switches serially, so review and decision capacity — not session count — caps throughput. Adding sessions raises coordination overhead; linear speedup is not guaranteed.; Source: parallel-agent-sessions
Worktree isolation · one agent, one branch, one PR: Running each agent in its own git worktree — an isolated checkout on its own branch sharing git objects — so agents never overwrite each other or main; wrong output is deleted, correct output becomes a branch. Isolates files and branches but not runtime (ports, databases, caches, processes).; Avoid: assuming runtime isolation — two agents on the same port or Postgres collide despite independent checkouts.; Source: worktree-isolation
Single-branch for swarms · Destructive Command Guard (DCG): The contrarian alternative to worktrees at 10+ parallel agents: all agents commit to main, with three mechanical substitutes for branch isolation — advisory file reservations with TTL expiry, a pre-commit guard, and a shell-level Destructive Command Guard. Strictly riskier than worktrees without coordination infrastructure, fungible agents, and pre-partitioned work.; Source: single-branch-git-agent-swarms
Headless agent · print mode, claude -p: An agent run non-interactively (e.g. claude -p) that exits when done, for CI pipelines. --max-turns caps reasoning steps (no default — without it there is no limit); PermissionRequest hooks do not fire, so enforcement shifts to PreToolUse/--allowedTools or dontAsk/auto modes.; Avoid: --dangerously-skip-permissions as a CI default — reserve it for ephemeral, bounded-blast-radius runners.; Source: headless-claude-ci
Background-to-foreground handoff · ~90% handoff: Transferring ownership of a task's nuanced tail from a background agent to a human at a planned judgment threshold — not at failure, not at 100% — carrying a distilled summary, durable artifacts (draft PR, progress file, git history), and tool parity. Stopping too late forfeits the irreversible judgment call the handoff existed to preserve.; Source: background-foreground-handoff · in-thread-side-channel

Keeping the Codebase Healthy

Snapshot rollback · speculative execution, environment pollution: Issuing docker commit before each state-modifying command during repo setup so a failed or unverified command can be reverted, converting irreversible system mutations into reversible ones. Excludes read-only commands to avoid overhead. Validated by Repo2Run (86.0% vs. baselines).; Source: experiential-setup-agents-snapshot-rollback · Hu et al. — Repo2Run
Prosecutor-judge verification: A two-role verification protocol where a prosecutor gathers evidence (runs documented tests, exercises README entry points, checks health endpoints) and a judge renders the binary verdict — because surface exit-code success and "the documented feature runs" are different properties.; Source: experiential-setup-agents-snapshot-rollback · Zhou et al. — SetupX
Failure seam: A boundary in a sequenced sub-agent pipeline where a step either succeeds under a typed contract or raises, so the pipeline surfaces which step failed rather than one generic error. The first step of the monolith-to-sub-agents refactor; splitting without a defined topology instead amplifies errors (up to a 17.2× "bag of agents" multiplier).; Avoid: decomposing tightly-coupled stateful work — context the monolith carried implicitly gets lost across schemas.; Source: monolith-to-subagents-refactor
Entropy reduction agent · garbage collection of tech debt: A scheduled background agent that scans the whole codebase on a cadence for violations of encoded golden principles and opens one-violation-per-PR fixes reviewable in under a minute — proactive where CI is reactive. Captures human taste once, enforces it continuously; gate on tests and human review, since ~two-thirds of unsupervised AI refactors break code.; Source: entropy-reduction-agents · Fowler/Böckeler
Velocity-quality asymmetry: The empirical pattern that AI coding tools deliver a transient velocity burst (+281% lines in month 1, faded by month 3) while quality debt persists (+30% static warnings, +42% complexity at 6+ months), so sustainable speed requires scaling QA up front. Not a trade-off — an asymmetry.; Source: velocity-quality-asymmetry · He et al., MSR 2026
Eval-driven development · write evals before building: Defining 20–50 tasks, success criteria, and a grader before writing any agent feature code, so "done" has an objective definition and a low baseline pass rate becomes the visible improvement surface. Evals are executable specifications: teams that have them adopt new models in days rather than weeks of manual regression.; Avoid: writing tasks while building — that overfits the suite to observed behavior, baking current bugs into "correct".; Source: eval-driven-development · Anthropic — Demystifying Evals