Tactical AEO
llms.txt and robots.txt: The Two Files Every Website Needs for AEO (With Ready-to-Use Templates)
Two files at your domain root determine whether AI engines can find and understand your brand. Here's what they do, the mistakes that make brands invisible, and copy-paste templates for both.
llms.txt and robots.txt: The Two Files Every Website Needs for AEO (With Ready-to-Use Templates)
The short answer: robots.txt controls whether AI crawlers can access your site at all. llms.txt tells them what your site contains and what your brand does. Get either one wrong and you're invisible in AI search — regardless of how good your content is. This guide covers both, with copy-paste templates you can deploy today.
There are hundreds of AEO tactics. Most take weeks or months to show results. These two take an afternoon and can produce measurable improvements in AI citation rates within days.
They're also the most commonly botched files on the web. Brands accidentally block AI crawlers. They're missing llms.txt entirely. Or they have both files but they're configured in ways that actively hurt their AI visibility.
This is your complete guide to getting both right.
Part 1: robots.txt for AEO
What robots.txt Does
robots.txt is a plain text file at your domain root (yourdomain.com/robots.txt) that tells web crawlers which pages they can and cannot access. It's been a web standard since 1994 and is respected by all major crawlers — including every major AI bot.
The critical word is "respected." robots.txt is not a security measure — it's a convention. Legitimate crawlers follow it. Bad actors don't. For AI engines (which are legitimate), robots.txt is the authority on what they're allowed to index.
If your robots.txt blocks an AI crawler, that crawler will not index your site. No index means no citations. It's that simple.
The AI Crawlers You Need to Know
Each major AI engine has its own crawler. Here are the ones that matter for AEO:
| Crawler | AI Engine | Company |
|---|---|---|
GPTBot |
ChatGPT / OpenAI | OpenAI |
PerplexityBot |
Perplexity | Perplexity AI |
ClaudeBot |
Claude | Anthropic |
Google-Extended |
Gemini / Google AI | |
Amazonbot |
Alexa / Amazon AI | Amazon |
Applebot-Extended |
Apple Intelligence | Apple |
YouBot |
You.com | You.com |
cohere-ai |
Cohere | Cohere |
The Most Common robots.txt Mistake
Here is the robots.txt configuration that silently kills AEO for thousands of brands:
User-agent: *
Disallow: /admin
Disallow: /private
User-agent: Googlebot
Allow: /
This looks reasonable. It blocks admin pages from all bots and explicitly allows Googlebot. The problem: the wildcard User-agent: * Disallow rules apply to every crawler not explicitly listed — including GPTBot, PerplexityBot, and ClaudeBot. These bots see Disallow rules and no Allow rule for themselves, so they follow the most restrictive interpretation.
Result: Perfect Google SEO. Complete AI invisibility.
Other Common robots.txt Mistakes
Mistake 1: Blocking AI crawlers explicitly (often by accident)
Some security or bot-blocking tools add AI crawlers to a deny list to reduce server load. Check your robots.txt for any of these:
# THIS BLOCKS AI ENGINES — remove if present
User-agent: GPTBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
Mistake 2: No robots.txt at all
A missing robots.txt means crawlers make their own decisions about what to index. Most will index everything — but some conservative implementations skip sites with no robots.txt. Always have one.
Mistake 3: Blocking your best AEO pages
Some brands block their pricing page, about page, or blog from all crawlers to prevent scraping. These are exactly the pages AI engines need to understand your brand. Never block them.
Mistake 4: Missing the llms.txt reference
Your robots.txt can point crawlers to your llms.txt using a Sitemap-style reference. Most brands don't do this. It's a free signal.
robots.txt Template — General Website
Copy this, replace yourdomain.com, and customize the Disallow rules for your actual private paths:
# robots.txt for yourdomain.com
# Last updated: [date]
# ─── SEARCH ENGINES ──────────────────────
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# ─── AI CRAWLERS — ALL EXPLICITLY ALLOWED ─
User-agent: GPTBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: YouBot
Allow: /
User-agent: cohere-ai
Allow: /
# ─── ALL OTHER BOTS ──────────────────────
User-agent: *
Allow: /
# ─── BLOCK PRIVATE PATHS FROM ALL BOTS ───
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /_next/
Disallow: /dashboard/
# ─── SITEMAPS ─────────────────────────────
Sitemap: https://yourdomain.com/sitemap.xml
Sitemap: https://yourdomain.com/llms.txt
robots.txt Template — SaaS Product (with auth-gated dashboard)
For SaaS products where the dashboard is behind login and should not be indexed:
# robots.txt for yoursaas.com
# Last updated: [date]
# ─── AI CRAWLERS — EXPLICITLY ALLOWED ────
User-agent: GPTBot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/
User-agent: PerplexityBot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/
User-agent: ClaudeBot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/
User-agent: Google-Extended
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/
User-agent: Amazonbot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/
User-agent: Applebot-Extended
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/
# ─── SEARCH ENGINES ──────────────────────
User-agent: Googlebot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/
User-agent: Bingbot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/
# ─── ALL OTHER BOTS ──────────────────────
User-agent: *
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/
Disallow: /admin/
# ─── SITEMAPS ─────────────────────────────
Sitemap: https://yoursaas.com/sitemap.xml
Sitemap: https://yoursaas.com/llms.txt
What to customize:
- Replace
/dashboard/,/api/,/auth/with your actual private paths - Add any other paths that are auth-gated or irrelevant for crawlers
- Update the sitemap URL to your actual sitemap location
Part 2: llms.txt for AEO
What llms.txt Does
llms.txt is an emerging standard proposed in 2024 by Jeremy Howard (founder of fast.ai). It's a plain text file at your domain root (yourdomain.com/llms.txt) that gives AI language models a structured, machine-readable overview of your website.
Think of it as a cover letter for AI systems. Instead of making a crawler parse your entire site architecture to understand what you do and which pages matter — you tell it directly, in a clean format it can immediately use.
robots.txt says: here's what you can access. llms.txt says: here's what you're looking at.
Why llms.txt Matters for AEO
Faster, more accurate retrieval. AI search engines like Perplexity retrieve live web content for every query. llms.txt gives them an immediate, structured understanding of your site — which pages are most important, what your brand does, who it serves. This improves both the accuracy and confidence of citations.
Better entity clarity. A clear, machine-readable description of your brand reinforces your entity signals. AI models build higher-confidence profiles of brands with llms.txt — which increases citation probability.
Early mover advantage. As of 2025, the vast majority of websites don't have an llms.txt. Every brand that adds one now benefits from being more clearly understood by AI systems during the period when those systems are actively learning about the web.
Common llms.txt Mistakes
Mistake 1: Writing it like marketing copy
llms.txt is read by machines, not humans. Marketing language ("revolutionary platform," "best-in-class solution") adds noise and reduces clarity. Write in factual, plain language.
❌ Wrong:
Voxrank is the revolutionary AI-powered platform
transforming how forward-thinking brands dominate
the future of AI-powered search discovery.
✅ Right:
Voxrank is an AEO (Answer Engine Optimization)
platform that audits brand visibility in AI answer
engines and provides fixes to improve citation rates.
Mistake 2: Missing key URLs
The most valuable part of llms.txt for retrieval systems is the URL list. Brands often include a description but forget to list their most important pages. Include every page an AI should know about.
Mistake 3: Putting it somewhere other than the root
llms.txt must live at yourdomain.com/llms.txt — not /public/llms.txt, not /static/llms.txt. Crawlers look for it at the root only.
Mistake 4: Never updating it
llms.txt should reflect your current site. If you launch a new product, add a pricing tier, or publish a key resource — update llms.txt. Quarterly reviews are a reasonable cadence.
Mistake 5: Blocking it in robots.txt
If your robots.txt blocks /llms.txt — intentionally or via a wildcard rule — crawlers can't read it. Ensure llms.txt is explicitly allowed (the templates above handle this).
llms.txt Template — General Website
# [Your Brand Name]
> [One to two sentence factual description of what your brand
does, who it serves, and what category it belongs to.]
[Optional: 1-2 sentences of additional context — founding,
location, key differentiator.]
## What We Do
- [Key capability 1]
- [Key capability 2]
- [Key capability 3]
- [Key capability 4]
## Who It's For
- [Customer segment 1 and their core problem]
- [Customer segment 2 and their core problem]
- [Customer segment 3 and their core problem]
## Pricing
- [Plan name]: [Price] — [One line description]
- [Plan name]: [Price] — [One line description]
- [Plan name]: [Price] — [One line description]
## Key Pages
- [Page name]: [Full URL] — [One sentence description]
- [Page name]: [Full URL] — [One sentence description]
- [Page name]: [Full URL] — [One sentence description]
- [Page name]: [Full URL] — [One sentence description]
- [Page name]: [Full URL] — [One sentence description]
## About
[Founder name], [location], [founding year if relevant].
[Any notable press, awards, or trust signals worth including.]
llms.txt Template — SaaS Product
# [Product Name]
> [Product name] is a [category] platform that helps
[target customer] [achieve outcome] by [key mechanism].
Built by [Founder/Company], based in [Location], founded [Year].
## What [Product Name] Does
- [Core feature 1 — described as an outcome, not a feature name]
- [Core feature 2]
- [Core feature 3]
- [Core feature 4]
- [Core feature 5]
## Who It's For
- [ICP 1]: [specific problem they have that this solves]
- [ICP 2]: [specific problem they have that this solves]
- [ICP 3]: [specific problem they have that this solves]
## Pricing
- Free: [What's included — no credit card required if applicable]
- [Paid tier 1]: $[X]/mo — [What's included, who it's for]
- [Paid tier 2]: $[X]/mo — [What's included, who it's for]
- Enterprise: [Contact / custom pricing] — [What's included]
## Key Pages
- Homepage: https://[yourdomain].com — Overview and free audit
- Pricing: https://[yourdomain].com/pricing — Full plan comparison
- How it works: https://[yourdomain].com/how-it-works
- Blog: https://[yourdomain].com/blog — [Topic] guides and resources
- About: https://[yourdomain].com/about — Company and team
## Integrations & Stack
[Optional: list key integrations, APIs, or tech if relevant
to how buyers evaluate your product]
## About
[Founder(s)], [Company name if different from product],
[Location]. [One sentence on mission or approach if it
adds meaningful context.]
## Contact
- Support: [support email or URL]
- Sales/Enterprise: [contact email or URL]
How to Deploy Both Files
robots.txt
- Create a plain text file named
robots.txt - Place it at the root of your domain —
yourdomain.com/robots.txt - Verify it's accessible by navigating to that URL in a browser
- Test with Google Search Console's robots.txt tester
llms.txt
- Create a plain text file named
llms.txtusing Markdown formatting - Place it at the root of your domain —
yourdomain.com/llms.txt - Verify it's accessible by navigating to that URL
- Ensure your robots.txt allows crawlers to access it (the templates above include this)
For Next.js users: Place both files in your /public directory. Next.js serves /public at the root, so /public/robots.txt is accessible at yourdomain.com/robots.txt.
For other frameworks: Place both files in whatever directory is served as your static root.
Verifying Your Setup
After deploying, run these checks:
robots.txt:
- Navigate to
yourdomain.com/robots.txt— confirm it loads - Use Google Search Console → Settings → robots.txt Tester
- Check that GPTBot, PerplexityBot, and ClaudeBot are not blocked
llms.txt:
- Navigate to
yourdomain.com/llms.txt— confirm it loads and renders cleanly - Ask Perplexity: "What does [yourdomain].com do?" — within a few weeks of deploying, Perplexity should give a more accurate answer
- Run a Voxrank audit — llms.txt presence is one of the 32 checked metrics
Frequently Asked Questions
Is llms.txt an official standard?
Not yet. As of 2025, llms.txt is a proposal with growing adoption but no formal W3C or IETF ratification. Perplexity has indicated support. Several major AI companies have acknowledged it. Adoption is growing fast enough that implementing it now provides real benefit, with no downside risk.
Do I need both files?
Yes. They serve different purposes and neither substitutes for the other. robots.txt without llms.txt = AI can access your site but has to figure out what it means on its own. llms.txt without a correct robots.txt = AI knows what your site contains but might be blocked from accessing it.
What if I'm on Webflow, Squarespace, or another hosted platform?
Check whether your platform supports custom file uploads at the root level. Webflow supports custom files. Squarespace has limited support — you may need to use a custom domain redirect or a DNS-level solution. WordPress users can add both files directly via FTP or a file manager plugin.
How quickly will AI engines respond to these changes?
robots.txt changes are respected almost immediately by active crawlers — within days. llms.txt takes longer to influence AI output because it depends on when the crawler next visits your site and how quickly that data feeds into the system. Perplexity typically responds within 1–3 weeks. Training data signals take longer. Run monthly query tests to track improvement.
Should I include my llms.txt in my sitemap?
You can reference llms.txt in your robots.txt using the Sitemap directive (as shown in the templates). This is optional but helps crawlers discover it. Do not include llms.txt in your XML sitemap — that's for HTML pages, not root files.
Published in The Answer — Voxrank's publication on brand discovery in the AI era. Check whether your robots.txt and llms.txt are correctly configured with a free audit at voxrank.ai.