Part 2 · Structuring for Retrieval

GEO · ~7 min

Answer-First, Atomic Pages

An engine doesn't cite your page. It cites one chunk of it. Two structural moves decide whether that chunk is a tight, quotable answer — or a blurred average.

Why this, for you: these are the two cheapest, highest-leverage edits you can make to existing docs. They cost an afternoon of restructuring and change which passage a retriever pulls — the difference between being cited and being skipped on a page you already have.

RAG systems chunk your content (typically 256–512 tokens), embed each chunk, and at query time rank chunks by cosine similarity to the question. The returned chunk is what gets cited — not the page.

1 Answer-first: own the opening of the chunk

When a section opens with a direct answer, the chunk's dominant semantic signal is that answer — strongly similar to queries about the topic. Open with context, caveats, or "in this article we'll explore…" and the embedding averages across preamble and answer, producing a weaker signal for any single query.

Anthropic's Contextual Retrieval study showed that adjusting the text embedded at the start of each chunk cut retrieval failure by up to 49% — direct evidence that chunk-opening content dominates retrieval quality.

The pattern: open every H2 with a 40–60 word self-contained answer, then elaborate. Long enough to be citable on its own; short enough to dominate the chunk before elaboration dilutes it.

# Before — preamble first (averaged, weak embedding) ## Chunking Strategies There are many approaches to chunking. The choice depends on your document type and embedding model. Let's explore the options... # After — answer first (tight, citable embedding) ## Chunking Strategies Chunking splits documents into passages (256–512 tokens) before embedding. Section-aligned chunking — splitting at H2 boundaries — consistently produces the highest retrieval precision for docs.

2 Atomic pages: one concept, one clean chunk

The same averaging penalty applies at the page level. A 2,000-word page covering five subtopics produces five blended embeddings, each weaker than a dedicated page's. One concept per page makes the page map cleanly to one chunk — the top passage is about exactly that concept, not a mix of tangents.

Page-level chunking scored highest: 0.648

NVIDIA's 2024 benchmark found page-level chunking gave the highest average retrieval accuracy, with 256–512 token chunks best for factoid queries. Keep H2 sections to 200–400 words — enough context, still semantically tight.

Descriptive headings carry weight too: "How RAG Systems Score Sections" is a retrieval anchor; "Overview" is zero semantic load. And headings enable deep links — an engine can cite page.md#how-rag-chunks, not just the page.

3 Know when it doesn't help

Answer-first optimizes for chunk-based vector retrieval. It adds little when a tool embeds whole documents, uses keyword/BM25 search, or pastes the entire page into context (no retrieval step). And over-atomizing hurts: a multi-step workflow split across three pages may retrieve only step 2 with no setup. The rule is one meaningful concept per page — not one sentence.

↪ Your win: tight chunks beat long pages

Retrieval practice — recall, don't peek

Question 1The recommended length for an answer-first section opener is…

Question 2Opening a section with preamble instead of the answer…

Question 3A single page covering five subtopics tends to produce…

Question 4Answer-first structure adds little benefit when the tool…

Question 5 · spaced recall from Lesson 02Perplexity's retrieval differs from ChatGPT's in that it…

Ask me anything. Want the retrofit checklist for existing pages, or the journalistic-pyramid contrast? Next: Assertion Density — why a specific number gets cited where a vague claim doesn't.
✎ Feedback