Prompt Engineering · ~8 min
A shipping system prompt isn't paragraphs of advice — it's a structured document with named sections, hard concern boundaries, and a layout dictated by the cache. Here's the architecture, read off one that ships.
Anthropic recommends using "XML tagging or Markdown headers to delineate sections" (context engineering). A leaked 102K-character system prompt from a Claude.ai computer-use session shows what that looks like at scale — roughly 25 top-level XML sections, each owning one concern (production-system-prompt-architecture).
The prompt is scaffolded with named tags — <computer_use>,
<harmful_content_safety>, <file_handling_rules>, <skills> — and
the tags do three jobs at once:
<harmful_content_safety> applies to harmful
content decisions only, with no bleed into unrelated behaviour. Selective attention — the model finds
the relevant section without re-scanning the whole prompt. Cache stability — a section can be edited
without invalidating the prefix above it.This is the same boundary discipline as Lesson 8's layered scopes, but within one document: discrete,
non-interleaved blocks so a reviewer can audit the copyright rules without reading the computer-use ones. The uppercase
convention — <CRITICAL_COPYRIGHT_COMPLIANCE> — signals absolute priority, the polarity lever (Lesson
2) applied to a whole section.
Prompt caching matches on an exact prefix: any change to an earlier token invalidates everything cached after it (prompt caching). That single fact decides the whole layout — stable content goes early, volatile content goes last.
| Position | Holds | Why there |
|---|---|---|
| Head | Temporal grounding — date, environment, location | Always in the cache prefix; never invalidated |
| Middle | Behavioural rules in named XML sections | Editable section-by-section above the tail |
| Tail | Reasoning effort, thinking-mode parameters | Runtime-variable — kept last so the prefix stays stable |
Put the runtime knobs at the head and every session with a different effort level would miss the cache. Put the date at the tail and it would still be cache-stable but harder for the model to ground on early. Position is an engineering constraint here, not a style choice.
Two patterns keep a large prompt lean. Skills are declared in an <available_skills> registry — a
name, a trigger description, and a filesystem path — with the skill's full content loaded only on demand. A registry of
20 pointers costs far fewer tokens than 20 inlined definitions: point at the spec
(Lesson 9), applied to whole capabilities.
Tools follow the same logic. Anthropic's advanced
tool use documents a defer_loading flag that keeps tool definitions masked until searched, cutting
context from ~77K to ~8.7K tokens. The production prompt declares all tools statically and masks them at
runtime — so the cache-stable definition stays put while the agent only ever sees the relevant subset.
This is one deployment-context capture, not a universal template — and the patterns reverse below a threshold. For
a prompt under ~500 tokens, XML concern-isolation costs more tokens than the cache hits it buys. Twenty-five sections
become hard to audit — rules in obscure blocks get ignored. Renaming or reordering a section invalidates the cache
below it, so refactoring gets expensive at production scale. And over-isolation creates latent conflicts: a
<safety> block that silently overrides <code_generation> without a stated
precedence rule. The structure earns its keep only when the prompt is large and its sections are individually stable.
Retrieval practice — recall, don't peek
Question 1Naming each concern in its own XML section primarily buys you…
Question 2Runtime-variable parameters sit at the prompt tail so that…
Question 3A skills registry keeps the prompt lean by storing each skill as…
Question 4For a prompt of only a few hundred tokens, heavy XML sectioning tends to…
Question 5 · spaced recall from Lesson 12After a long session auto-compacts, the right move to restore rule fidelity is to…