Reference · Canonical Language

GEO — Glossary

The working vocabulary for the Generative Engine Optimization course. Once a term lives here, every lesson uses this word for it.

Foundations

Generative Engine Optimization · aliases: GEO, AEO, Answer Engine Optimization: The practice of structuring content so AI answer engines cite it — not just rank it. The optimization target is citation share inside a synthesized answer, not a position in a list of links.; Avoid: "AI SEO" — the signals and metrics differ; rank is not a GEO proxy.; Source: what-is-geo
Citation share · citation share of voice: The GEO analogue of rank: the percentage of AI responses (or of total category citations) that include your content. The primary thing GEO optimizes for.; Source: what-is-geo · measuring-geo-performance
Brand mentions · earned media: Off-site references to your brand in forums, reviews, and press. A stronger predictor of AI citation than backlinks; most AI citations come from third-party platforms (Reddit 40.1%, Wikipedia 26.3%, YouTube 23.5% per Semrush).; Avoid: treating "deprioritise backlinks" as "abandon" — it means reallocate; link authority is a secondary positive signal.; Source: seo-vs-geo · topical-authority

Engines & Crawlers

Retrieval bot · search bot: A crawler that powers real-time citations in AI chat and search (e.g. OAI-SearchBot, Claude-SearchBot, PerplexityBot). Allow these in robots.txt to stay citation-eligible.; Source: how-ai-engines-cite · ai-crawler-policy
Training bot · training scraper: A crawler that ingests content for model training (e.g. GPTBot, ClaudeBot, Google-Extended). Disallowing it opts you out of training datasets without affecting citation eligibility — it uses a different user-agent than the retrieval bot.; Avoid: "blocking AI bots" as one action — training and retrieval are separate tiers with separate user-agents.; Source: ai-crawler-policy
Freshness weighting: How heavily an engine favours recently-published content in source selection. Perplexity weights freshness most strongly; ChatGPT and Gemini lean toward authority. Optimizing for one can suppress the other.; Source: how-ai-engines-cite

Structuring for Retrieval

Chunk: A passage (typically 256–512 tokens) that a RAG system splits a document into, embeds, and scores against a query. The retrieved chunk — not the page — is what gets cited.; Avoid: "the page gets cited" — engines cite passages, not whole pages.; Source: atomic-pages-and-chunking
Answer-first writing · lead-with-the-answer: Placing a direct 40–60 word answer at the start of every H2 before elaborating, so the chunk embedding is dominated by the answer rather than averaged across preamble. Adjusting chunk-opening text cut retrieval failure up to 49% in Anthropic's Contextual Retrieval study.; Source: answer-first-writing
Atomic page · one concept per page: A page covering exactly one concept, so it maps cleanly to one chunk with a tight embedding. Multi-concept pages produce blended, diluted embeddings. The rule is one meaningful concept per page — not one sentence.; Source: atomic-pages-and-chunking
Assertion density: The degree to which prose uses specific, attributable facts — numbers, units, dates, sample sizes, named-source quotes — over vague qualifiers. The highest-lift single rewrite in the Princeton benchmark (Quotation +41%, Statistics +40%, Cite Sources +30%).; Avoid: fabricating statistics to raise density — no real source means weaken or remove the claim.; Source: assertion-density
Keyword stuffing: Repeating target keywords for density. A historical SEO tactic that transfers to GEO as a negative — it scored −10% source visibility in the Princeton benchmark while consuming token budget.; Source: assertion-density · seo-vs-geo

The Technical Layer

llms.txt · llms-full.txt: A curated Markdown index at the site root giving AI agents a pre-filtered entry point — fetch the file, pick a section, fetch only the linked pages needed. Agent navigation infrastructure, not a citation or ranking signal. The spec requires only an H1.; Avoid: expecting llms.txt to lift citations — no provider confirms reading it at inference; use schema for citation lift.; Source: llms-txt
Structured data · schema, JSON-LD: Machine-readable markup (FAQPage, HowTo, DefinedTerm) that pre-packages content in formats engines reuse. The technical lever that does lift citation — FAQPage reports 2.7×–3.2× gains — with benefit accruing at indexing time, not on live fetch.; Avoid: stale or mismatched schema — body text drifting from the markup makes engines deprioritize the page.; Source: schema-and-structured-data
Three-tier crawler taxonomy: The robots.txt model for AI crawlers: retrieval bots (allow), training scrapers (disallow), and non-compliant bots like Bytespider (CDN/WAF block). robots.txt is advisory — ChatGPT-User was exempted in Dec 2025 and Perplexity has been documented evading it.; Source: ai-crawler-policy

Measurement & Strategy

Share of Model · SoM: The percentage of AI responses where your brand appears for relevant category queries. A GEO-native metric measured by repeated sampling, since engines expose no impression counts or rank API.; Source: measuring-geo-performance
Attribution gap: The blind spot where an AI-discovered visit lands days later and registers as direct traffic — the discovery touch is invisible, so GEO's contribution to a conversion can't be traced from analytics alone.; Avoid: treating citation counts as a revenue proxy — visibility is not intent.; Source: measuring-geo-performance
Topical authority · entity coverage: Comprehensive, interconnected coverage of a topic domain that makes AI systems recognize a site as the authoritative entity for a subject. Returns compound as coverage grows — many linked pages on one subject outperform one excellent page on a subtopic.; Source: topical-authority