Reference · Canonical Language

GEO — Glossary

The working vocabulary for the Generative Engine Optimization course. Once a term lives here, every lesson uses this word for it.

Foundations

Generative Engine Optimization · aliases: GEO, AEO, Answer Engine Optimization
The practice of structuring content so AI answer engines cite it — not just rank it. The optimization target is citation share inside a synthesized answer, not a position in a list of links.
Avoid: "AI SEO" — the signals and metrics differ; rank is not a GEO proxy.
Source: what-is-geo
Citation share · citation share of voice
The GEO analogue of rank: the percentage of AI responses (or of total category citations) that include your content. The primary thing GEO optimizes for.
Source: what-is-geo · measuring-geo-performance
Brand mentions · earned media
Off-site references to your brand in forums, reviews, and press. A stronger predictor of AI citation than backlinks; most AI citations come from third-party platforms (Reddit 40.1%, Wikipedia 26.3%, YouTube 23.5% per Semrush).
Avoid: treating "deprioritise backlinks" as "abandon" — it means reallocate; link authority is a secondary positive signal.
Source: seo-vs-geo · topical-authority

Engines & Crawlers

Retrieval bot · search bot
A crawler that powers real-time citations in AI chat and search (e.g. OAI-SearchBot, Claude-SearchBot, PerplexityBot). Allow these in robots.txt to stay citation-eligible.
Source: how-ai-engines-cite · ai-crawler-policy
Training bot · training scraper
A crawler that ingests content for model training (e.g. GPTBot, ClaudeBot, Google-Extended). Disallowing it opts you out of training datasets without affecting citation eligibility — it uses a different user-agent than the retrieval bot.
Avoid: "blocking AI bots" as one action — training and retrieval are separate tiers with separate user-agents.
Source: ai-crawler-policy
Freshness weighting
How heavily an engine favours recently-published content in source selection. Perplexity weights freshness most strongly; ChatGPT and Gemini lean toward authority. Optimizing for one can suppress the other.
Source: how-ai-engines-cite

Structuring for Retrieval

Chunk
A passage (typically 256–512 tokens) that a RAG system splits a document into, embeds, and scores against a query. The retrieved chunk — not the page — is what gets cited.
Avoid: "the page gets cited" — engines cite passages, not whole pages.
Source: atomic-pages-and-chunking
Answer-first writing · lead-with-the-answer
Placing a direct 40–60 word answer at the start of every H2 before elaborating, so the chunk embedding is dominated by the answer rather than averaged across preamble. Adjusting chunk-opening text cut retrieval failure up to 49% in Anthropic's Contextual Retrieval study.
Source: answer-first-writing
Atomic page · one concept per page
A page covering exactly one concept, so it maps cleanly to one chunk with a tight embedding. Multi-concept pages produce blended, diluted embeddings. The rule is one meaningful concept per page — not one sentence.
Source: atomic-pages-and-chunking
Assertion density
The degree to which prose uses specific, attributable facts — numbers, units, dates, sample sizes, named-source quotes — over vague qualifiers. The highest-lift single rewrite in the Princeton benchmark (Quotation +41%, Statistics +40%, Cite Sources +30%).
Avoid: fabricating statistics to raise density — no real source means weaken or remove the claim.
Source: assertion-density
Keyword stuffing
Repeating target keywords for density. A historical SEO tactic that transfers to GEO as a negative — it scored −10% source visibility in the Princeton benchmark while consuming token budget.
Source: assertion-density · seo-vs-geo

The Technical Layer

llms.txt · llms-full.txt
A curated Markdown index at the site root giving AI agents a pre-filtered entry point — fetch the file, pick a section, fetch only the linked pages needed. Agent navigation infrastructure, not a citation or ranking signal. The spec requires only an H1.
Avoid: expecting llms.txt to lift citations — no provider confirms reading it at inference; use schema for citation lift.
Source: llms-txt
Structured data · schema, JSON-LD
Machine-readable markup (FAQPage, HowTo, DefinedTerm) that pre-packages content in formats engines reuse. The technical lever that does lift citation — FAQPage reports 2.7×–3.2× gains — with benefit accruing at indexing time, not on live fetch.
Avoid: stale or mismatched schema — body text drifting from the markup makes engines deprioritize the page.
Source: schema-and-structured-data
Three-tier crawler taxonomy
The robots.txt model for AI crawlers: retrieval bots (allow), training scrapers (disallow), and non-compliant bots like Bytespider (CDN/WAF block). robots.txt is advisory — ChatGPT-User was exempted in Dec 2025 and Perplexity has been documented evading it.
Source: ai-crawler-policy

Measurement & Strategy

Share of Model · SoM
The percentage of AI responses where your brand appears for relevant category queries. A GEO-native metric measured by repeated sampling, since engines expose no impression counts or rank API.
Source: measuring-geo-performance
Attribution gap
The blind spot where an AI-discovered visit lands days later and registers as direct traffic — the discovery touch is invisible, so GEO's contribution to a conversion can't be traced from analytics alone.
Avoid: treating citation counts as a revenue proxy — visibility is not intent.
Source: measuring-geo-performance
Topical authority · entity coverage
Comprehensive, interconnected coverage of a topic domain that makes AI systems recognize a site as the authoritative entity for a subject. Returns compound as coverage grows — many linked pages on one subject outperform one excellent page on a subtopic.
Source: topical-authority