The Production Stack

A shipping system prompt isn't paragraphs of advice — it's a structured document with named sections, hard concern boundaries, and a layout dictated by the cache. Here's the architecture, read off one that ships.

Why this, for you: every lesson so far sharpened one rule. This one zooms out to the whole document. When your instructions outgrow a single file — multiple concerns, tools, skills, safety blocks — how you arrange them starts to matter as much as what they say. This is the load-bearing skeleton a large prompt hangs on, and why each piece sits where it does.

Anthropic recommends using "XML tagging or Markdown headers to delineate sections" (context engineering). A leaked 102K-character system prompt from a Claude.ai computer-use session shows what that looks like at scale — roughly 25 top-level XML sections, each owning one concern (production-system-prompt-architecture).

1 XML sections isolate concerns

The prompt is scaffolded with named tags — <computer_use>, <harmful_content_safety>, <file_handling_rules>, <skills> — and the tags do three jobs at once:

Scope — a rule in <harmful_content_safety> applies to harmful content decisions only, with no bleed into unrelated behaviour. Selective attention — the model finds the relevant section without re-scanning the whole prompt. Cache stability — a section can be edited without invalidating the prefix above it.

This is the same boundary discipline as Lesson 8's layered scopes, but within one document: discrete, non-interleaved blocks so a reviewer can audit the copyright rules without reading the computer-use ones. The uppercase convention — <CRITICAL_COPYRIGHT_COMPLIANCE> — signals absolute priority, the polarity lever (Lesson 2) applied to a whole section.

2 Layout is dictated by the cache

Prompt caching matches on an exact prefix: any change to an earlier token invalidates everything cached after it (prompt caching). That single fact decides the whole layout — stable content goes early, volatile content goes last.

Position	Holds	Why there
Head	Temporal grounding — date, environment, location	Always in the cache prefix; never invalidated
Middle	Behavioural rules in named XML sections	Editable section-by-section above the tail
Tail	Reasoning effort, thinking-mode parameters	Runtime-variable — kept last so the prefix stays stable

Put the runtime knobs at the head and every session with a different effort level would miss the cache. Put the date at the tail and it would still be cache-stable but harder for the model to ground on early. Position is an engineering constraint here, not a style choice.

3 Skills and tools: pointers, not payloads

Two patterns keep a large prompt lean. Skills are declared in an <available_skills> registry — a name, a trigger description, and a filesystem path — with the skill's full content loaded only on demand. A registry of 20 pointers costs far fewer tokens than 20 inlined definitions: point at the spec (Lesson 9), applied to whole capabilities.

# a pointer, not the payload: <available_skills> <skill> <name>web_search</name> <description>Search the web. Use when...</description> <location>/path/to/SKILL.md</location> </skill> </available_skills>

Tools follow the same logic. Anthropic's advanced tool use documents a defer_loading flag that keeps tool definitions masked until searched, cutting context from ~77K to ~8.7K tokens. The production prompt declares all tools statically and masks them at runtime — so the cache-stable definition stays put while the agent only ever sees the relevant subset.

When the architecture is overhead

This is one deployment-context capture, not a universal template — and the patterns reverse below a threshold. For a prompt under ~500 tokens, XML concern-isolation costs more tokens than the cache hits it buys. Twenty-five sections become hard to audit — rules in obscure blocks get ignored. Renaming or reordering a section invalidates the cache below it, so refactoring gets expensive at production scale. And over-isolation creates latent conflicts: a <safety> block that silently overrides <code_generation> without a stated precedence rule. The structure earns its keep only when the prompt is large and its sections are individually stable.

↪ Your win: arrange the document, don't just write it

One concern per named section — XML/Markdown tags scope rules, aid attention, and isolate edits.
Let the cache set the order — stable head (date, env), volatile tail (runtime params).
Register skills as pointers — name, trigger, path; load full content on demand.
Declare tools statically, mask at runtime — cache-stable prefix, context-efficient surface.
Earn the overhead — this structure pays off only on large prompts with stable sections.

Retrieval practice — recall, don't peek

Question 1Naming each concern in its own XML section primarily buys you…

Question 2Runtime-variable parameters sit at the prompt tail so that…

Question 3A skills registry keeps the prompt lean by storing each skill as…

Question 4For a prompt of only a few hundred tokens, heavy XML sectioning tends to…

Question 5 · spaced recall from Lesson 12After a long session auto-compacts, the right move to restore rule fidelity is to…

Ask me anything. Want to sketch the section layout for your own large prompt, or work out whether yours is big enough that XML isolation pays off? Next, the last content lesson: Show Your Reasoning — putting worked reasoning traces in the prompt to lift not just format, but the quality of the thinking itself.