GEO · ~7 min
Three machine-facing files — llms.txt, JSON-LD schema, robots.txt — each do real work. The trap is expecting the wrong one to lift your citations.
Content techniques decide whether a chunk is citable. The technical layer decides whether an engine can reach and parse it at all. Three files, three distinct jobs — don't conflate them.
/llms.txt is a curated Markdown index at your site root that gives an AI agent a pre-filtered entry
point: it fetches the file, picks the relevant section, then fetches only the linked pages it needs — instead of
undirected crawling that burns context on irrelevant pages. The spec requires exactly one element:
an H1.
No major provider (Anthropic, OpenAI, Google) has published documentation confirming they read llms.txt at
inference time. A 300k-domain study found no statistical citation correlation. Its value is agent
comprehension, not ranking. Publish llms-full.txt too — the whole corpus concatenated for a
one-fetch load. And keep it current: stale links are worse than no file.
Structured data (JSON-LD) pre-packages content in the Q&A and step formats engines reuse, reducing extraction effort at indexing time — chatbots don't read JSON-LD on live fetch.
FAQPage for Q&A, HowTo for step lists,
DefinedTerm for named concepts.The catch: stale schema hurts. If body text drifts from the markup, engines see contradictory signals and may deprioritize the page. Wrong type for the shape (HowTo on prose) gets flagged by validators too.
AI crawlers split into three tiers, each needing a different robots.txt response. The default for docs sites: allow retrieval, disallow training, WAF-block the non-compliant.
| Tier | Examples | Action |
|---|---|---|
| Retrieval (powers citations) | OAI-SearchBot, Claude-SearchBot, PerplexityBot | Allow |
| Training scrapers | GPTBot, ClaudeBot, Google-Extended | Disallow |
| Non-compliant | Bytespider | CDN/WAF block |
But robots.txt is advisory, not enforceable. As of OpenAI's Dec 2025 policy update,
ChatGPT-User no longer respects robots.txt; Cloudflare documented Perplexity rotating user-agents to
evade blocks. For a hard block, you need WAF rules — not a robots.txt line.
llms-full.txt too, keep both current.Retrieval practice — recall, don't peek
Question 1The primary value of llms.txt is…
Question 2FAQPage schema reportedly lifts AI citation by roughly…
Question 3To stay citation-eligible while opting out of training, you…
Question 4For a hard block of a non-compliant crawler, you need…
Question 5 · spaced recall from Lesson 04If a claim has no real source, the right move is to…