"Significantly faster" gives a retriever nothing to grab. "Reduces latency by 23ms at p99" gives it a discrete, attributable fact. Specificity is the single highest-impact rewrite.
Why this, for you: this is the one rewrite with the biggest measured citation lift in the Princeton
benchmark. You don't restructure anything — you swap vague qualifiers for numbers, dates, and named sources on pages
you already have, and the page becomes far easier to quote verbatim.
AI engines cite what they can extract and attribute. A specific number is a discrete fact a model can
lift without paraphrasing risk; "significantly" is not. Two mechanisms make specifics win.
1 Why specificity gets cited
Token-level matching — a query like "how much does X improve Y" matches a numeric passage more precisely than one full of "substantially" or "often."
Attribution confidence — an attributed quote with a named credential and a dated statistic is easier to cite verbatim than a generality. Citations also signal external validation, reducing the engine's uncertainty.
The Princeton GEO study (Aggarwal et al., KDD 2024) tested 9 techniques across a 10,000-query
benchmark. The top rewrites: Quotation Addition +41%, Statistics Addition +40%,
Cite Sources +30% source visibility — while Keyword Stuffing scored −10%.
2 Strong vs weak assertions
Weak (retrieval-unfriendly)
Strong (retrieval-friendly)
"significantly improves quality"
"reduces latency by 23ms at p99"
"experts say", "research shows"
"according to Martin Fowler, author of Refactoring"
"most developers use AI tools"
"75% of developers, GitHub 2024 survey"
"much faster", "far more accurate"
"across 10,000 queries in 25 domains"
Replace vague quantifiers (many, often, most), unattributed authority (experts say), and
anchorless comparisons (much faster) with numbers, units, named sources, dates, and sample sizes.
3 Honest limits — don't game it
The benchmark rewards length
All three top techniques add content, and the study's PAWC metric rewards length — giving
content-addition a structural advantage. The study even permitted fabricated statistics. The directional
finding (specific beats vague) is robust; the exact percentages are an upper bound.
So the rule is firm: if a claim has no real source, weaken it to a factually-supportable form or remove
it — never invent a statistic. Manufactured numbers are detectable, and hedge tags add false confidence
without retrieval value. And assertion density can't rescue a buried answer: if the section doesn't lead with the
point (Lesson 03), the retriever misses the chunk before density ever matters.
↪ Your win: trade qualifiers for facts
Replace vague claims with specifics — numbers, units, dates, sample sizes, named sources.
Add attributed quotes and inline citations — Princeton: +41% / +40% / +30% visibility for quotes, stats, sources.
Drop keyword stuffing — it scored −10%; it's a token cost with no semantic gain.
Never fabricate — no real source? weaken or cut the claim; pair density with answer-first or the chunk is missed anyway.
Retrieval practice — recall, don't peek
Question 1In the Princeton benchmark, the single highest-lift rewrite was…
Question 2Keyword stuffing affected source visibility by about…
Question 3When a claim has no real source, you should…
Question 4Specific numbers get cited more because they…
Question 5 · spaced recall from Lesson 03An answer-first H2 opener should be roughly…
Ask me anything. Want the before/after rewrite guide, or why PAWC's length bias inflates the
numbers? Next, Part 3: Machine-Readable Corpora — llms.txt, schema, and crawler policy.