Part 2 · Structuring for Retrieval

GEO · ~6 min

Assertion Density

"Significantly faster" gives a retriever nothing to grab. "Reduces latency by 23ms at p99" gives it a discrete, attributable fact. Specificity is the single highest-impact rewrite.

Why this, for you: this is the one rewrite with the biggest measured citation lift in the Princeton benchmark. You don't restructure anything — you swap vague qualifiers for numbers, dates, and named sources on pages you already have, and the page becomes far easier to quote verbatim.

AI engines cite what they can extract and attribute. A specific number is a discrete fact a model can lift without paraphrasing risk; "significantly" is not. Two mechanisms make specifics win.

1 Why specificity gets cited

The Princeton GEO study (Aggarwal et al., KDD 2024) tested 9 techniques across a 10,000-query benchmark. The top rewrites: Quotation Addition +41%, Statistics Addition +40%, Cite Sources +30% source visibility — while Keyword Stuffing scored −10%.

2 Strong vs weak assertions

Weak (retrieval-unfriendly)Strong (retrieval-friendly)
"significantly improves quality""reduces latency by 23ms at p99"
"experts say", "research shows""according to Martin Fowler, author of Refactoring"
"most developers use AI tools""75% of developers, GitHub 2024 survey"
"much faster", "far more accurate""across 10,000 queries in 25 domains"

Replace vague quantifiers (many, often, most), unattributed authority (experts say), and anchorless comparisons (much faster) with numbers, units, named sources, dates, and sample sizes.

3 Honest limits — don't game it

The benchmark rewards length

All three top techniques add content, and the study's PAWC metric rewards length — giving content-addition a structural advantage. The study even permitted fabricated statistics. The directional finding (specific beats vague) is robust; the exact percentages are an upper bound.

So the rule is firm: if a claim has no real source, weaken it to a factually-supportable form or remove it — never invent a statistic. Manufactured numbers are detectable, and hedge tags add false confidence without retrieval value. And assertion density can't rescue a buried answer: if the section doesn't lead with the point (Lesson 03), the retriever misses the chunk before density ever matters.

↪ Your win: trade qualifiers for facts

Retrieval practice — recall, don't peek

Question 1In the Princeton benchmark, the single highest-lift rewrite was…

Question 2Keyword stuffing affected source visibility by about…

Question 3When a claim has no real source, you should…

Question 4Specific numbers get cited more because they…

Question 5 · spaced recall from Lesson 03An answer-first H2 opener should be roughly…

Ask me anything. Want the before/after rewrite guide, or why PAWC's length bias inflates the numbers? Next, Part 3: Machine-Readable Corpora — llms.txt, schema, and crawler policy.
✎ Feedback