What is AEO and why does it matter?

AEO (Answer Engine Optimization) is the practice of optimizing your brand so it gets cited by AI engines like ChatGPT, Perplexity, and Gemini. As more buyers research products through AI instead of Google, brands that don't appear in AI answers are invisible to a growing segment of their market.

How does the AEO score work?

Voxrank runs 32 checks across five categories: website signals, external authority, social presence, community mentions, and AI visibility signals. Each check is weighted by its impact on AI citation probability. Your score is normalized to 0–100.

Is it really free to start?

Yes. You can audit any domain for free — no credit card required. You get a full AEO score across all 32 checks plus a quick wins summary. Paid plans unlock saved domains, weekly rescans, and the AI Visibility Tracker.

How is this different from Ahrefs or Semrush?

Ahrefs and Semrush optimize for Google rankings. Voxrank optimizes for AI answer engines — a completely different ranking system. Google uses backlinks and keywords. AI engines use entity clarity, structured data, Reddit mentions, and knowledge base presence. We measure what actually drives AI citations.

Can I track my competitors?

Yes. Growth plan users can track up to 3 competitors and see side-by-side AI visibility scores, which queries trigger competitor citations, and where you're losing share of voice in AI answers.

Which AI engines does Voxrank track?

Voxrank currently tracks Perplexity, with ChatGPT and Gemini support on the Growth plan. Grok is coming soon. We show you whether your brand appears, your rank among mentions, and the sentiment of how each engine describes you.

How long does an audit take?

Most audits complete in under 60 seconds. Voxrank crawls your homepage and key pages, checks external signals across Wikipedia, Google Knowledge Graph, Crunchbase, and Reddit, then scores all 32 metrics in real time.

What kind of brands benefit most from AEO?

B2B SaaS companies, digital agencies, e-commerce brands, and any business where buyers research options before purchasing. If someone could type 'best [your category]' into ChatGPT and you're not showing up — Voxrank tells you exactly why and how to fix it.

Do I need technical skills to use Voxrank?

No. The audit runs automatically — just enter your domain. Fix recommendations are written in plain language. The Content Brief Generator produces ready-to-use content outlines. No coding required.

Tactical AEO

llms.txt and robots.txt: The Two Files Every Website Needs for AEO (With Ready-to-Use Templates)

Two files at your domain root determine whether AI engines can find and understand your brand. Here's what they do, the mistakes that make brands invisible, and copy-paste templates for both.

May 1, 2025·10 min read

llms.txt and robots.txt: The Two Files Every Website Needs for AEO (With Ready-to-Use Templates)

The short answer: robots.txt controls whether AI crawlers can access your site at all. llms.txt tells them what your site contains and what your brand does. Get either one wrong and you're invisible in AI search — regardless of how good your content is. This guide covers both, with copy-paste templates you can deploy today.

There are hundreds of AEO tactics. Most take weeks or months to show results. These two take an afternoon and can produce measurable improvements in AI citation rates within days.

They're also the most commonly botched files on the web. Brands accidentally block AI crawlers. They're missing llms.txt entirely. Or they have both files but they're configured in ways that actively hurt their AI visibility.

This is your complete guide to getting both right.

Part 1: robots.txt for AEO

What robots.txt Does

robots.txt is a plain text file at your domain root (yourdomain.com/robots.txt) that tells web crawlers which pages they can and cannot access. It's been a web standard since 1994 and is respected by all major crawlers — including every major AI bot.

The critical word is "respected." robots.txt is not a security measure — it's a convention. Legitimate crawlers follow it. Bad actors don't. For AI engines (which are legitimate), robots.txt is the authority on what they're allowed to index.

If your robots.txt blocks an AI crawler, that crawler will not index your site. No index means no citations. It's that simple.

The AI Crawlers You Need to Know

Each major AI engine has its own crawler. Here are the ones that matter for AEO:

Crawler	AI Engine	Company
`GPTBot`	ChatGPT / OpenAI	OpenAI
`PerplexityBot`	Perplexity	Perplexity AI
`ClaudeBot`	Claude	Anthropic
`Google-Extended`	Gemini / Google AI	Google
`Amazonbot`	Alexa / Amazon AI	Amazon
`Applebot-Extended`	Apple Intelligence	Apple
`YouBot`	You.com	You.com
`cohere-ai`	Cohere	Cohere

The Most Common robots.txt Mistake

Here is the robots.txt configuration that silently kills AEO for thousands of brands:

User-agent: *
Disallow: /admin
Disallow: /private

User-agent: Googlebot
Allow: /

This looks reasonable. It blocks admin pages from all bots and explicitly allows Googlebot. The problem: the wildcard User-agent: * Disallow rules apply to every crawler not explicitly listed — including GPTBot, PerplexityBot, and ClaudeBot. These bots see Disallow rules and no Allow rule for themselves, so they follow the most restrictive interpretation.

Result: Perfect Google SEO. Complete AI invisibility.

Other Common robots.txt Mistakes

Mistake 1: Blocking AI crawlers explicitly (often by accident)

Some security or bot-blocking tools add AI crawlers to a deny list to reduce server load. Check your robots.txt for any of these:

# THIS BLOCKS AI ENGINES — remove if present
User-agent: GPTBot
Disallow: /

User-agent: PerplexityBot  
Disallow: /

User-agent: ClaudeBot
Disallow: /

Mistake 2: No robots.txt at all

A missing robots.txt means crawlers make their own decisions about what to index. Most will index everything — but some conservative implementations skip sites with no robots.txt. Always have one.

Mistake 3: Blocking your best AEO pages

Some brands block their pricing page, about page, or blog from all crawlers to prevent scraping. These are exactly the pages AI engines need to understand your brand. Never block them.

Mistake 4: Missing the llms.txt reference

Your robots.txt can point crawlers to your llms.txt using a Sitemap-style reference. Most brands don't do this. It's a free signal.

robots.txt Template — General Website

Copy this, replace yourdomain.com, and customize the Disallow rules for your actual private paths:

# robots.txt for yourdomain.com
# Last updated: [date]

# ─── SEARCH ENGINES ──────────────────────
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# ─── AI CRAWLERS — ALL EXPLICITLY ALLOWED ─
User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: YouBot
Allow: /

User-agent: cohere-ai
Allow: /

# ─── ALL OTHER BOTS ──────────────────────
User-agent: *
Allow: /

# ─── BLOCK PRIVATE PATHS FROM ALL BOTS ───
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /_next/
Disallow: /dashboard/

# ─── SITEMAPS ─────────────────────────────
Sitemap: https://yourdomain.com/sitemap.xml
Sitemap: https://yourdomain.com/llms.txt

robots.txt Template — SaaS Product (with auth-gated dashboard)

For SaaS products where the dashboard is behind login and should not be indexed:

# robots.txt for yoursaas.com
# Last updated: [date]

# ─── AI CRAWLERS — EXPLICITLY ALLOWED ────
User-agent: GPTBot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/

User-agent: PerplexityBot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/

User-agent: ClaudeBot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/

User-agent: Google-Extended
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/

User-agent: Amazonbot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/

User-agent: Applebot-Extended
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/

# ─── SEARCH ENGINES ──────────────────────
User-agent: Googlebot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/

User-agent: Bingbot
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/

# ─── ALL OTHER BOTS ──────────────────────
User-agent: *
Allow: /
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/
Disallow: /admin/

# ─── SITEMAPS ─────────────────────────────
Sitemap: https://yoursaas.com/sitemap.xml
Sitemap: https://yoursaas.com/llms.txt

What to customize:

Replace /dashboard/, /api/, /auth/ with your actual private paths
Add any other paths that are auth-gated or irrelevant for crawlers
Update the sitemap URL to your actual sitemap location

Part 2: llms.txt for AEO

What llms.txt Does

llms.txt is an emerging standard proposed in 2024 by Jeremy Howard (founder of fast.ai). It's a plain text file at your domain root (yourdomain.com/llms.txt) that gives AI language models a structured, machine-readable overview of your website.

Think of it as a cover letter for AI systems. Instead of making a crawler parse your entire site architecture to understand what you do and which pages matter — you tell it directly, in a clean format it can immediately use.

robots.txt says: here's what you can access. llms.txt says: here's what you're looking at.

Why llms.txt Matters for AEO

Faster, more accurate retrieval. AI search engines like Perplexity retrieve live web content for every query. llms.txt gives them an immediate, structured understanding of your site — which pages are most important, what your brand does, who it serves. This improves both the accuracy and confidence of citations.

Better entity clarity. A clear, machine-readable description of your brand reinforces your entity signals. AI models build higher-confidence profiles of brands with llms.txt — which increases citation probability.

Early mover advantage. As of 2025, the vast majority of websites don't have an llms.txt. Every brand that adds one now benefits from being more clearly understood by AI systems during the period when those systems are actively learning about the web.

Common llms.txt Mistakes

Mistake 1: Writing it like marketing copy

llms.txt is read by machines, not humans. Marketing language ("revolutionary platform," "best-in-class solution") adds noise and reduces clarity. Write in factual, plain language.

❌ Wrong:

Voxrank is the revolutionary AI-powered platform 
transforming how forward-thinking brands dominate 
the future of AI-powered search discovery.

✅ Right:

Voxrank is an AEO (Answer Engine Optimization) 
platform that audits brand visibility in AI answer 
engines and provides fixes to improve citation rates.

Mistake 2: Missing key URLs

The most valuable part of llms.txt for retrieval systems is the URL list. Brands often include a description but forget to list their most important pages. Include every page an AI should know about.

Mistake 3: Putting it somewhere other than the root

llms.txt must live at yourdomain.com/llms.txt — not /public/llms.txt, not /static/llms.txt. Crawlers look for it at the root only.

Mistake 4: Never updating it

llms.txt should reflect your current site. If you launch a new product, add a pricing tier, or publish a key resource — update llms.txt. Quarterly reviews are a reasonable cadence.

Mistake 5: Blocking it in robots.txt

If your robots.txt blocks /llms.txt — intentionally or via a wildcard rule — crawlers can't read it. Ensure llms.txt is explicitly allowed (the templates above handle this).

llms.txt Template — General Website

# [Your Brand Name]

> [One to two sentence factual description of what your brand 
does, who it serves, and what category it belongs to.]

[Optional: 1-2 sentences of additional context — founding, 
location, key differentiator.]

## What We Do
- [Key capability 1]
- [Key capability 2]
- [Key capability 3]
- [Key capability 4]

## Who It's For
- [Customer segment 1 and their core problem]
- [Customer segment 2 and their core problem]
- [Customer segment 3 and their core problem]

## Pricing
- [Plan name]: [Price] — [One line description]
- [Plan name]: [Price] — [One line description]
- [Plan name]: [Price] — [One line description]

## Key Pages
- [Page name]: [Full URL] — [One sentence description]
- [Page name]: [Full URL] — [One sentence description]
- [Page name]: [Full URL] — [One sentence description]
- [Page name]: [Full URL] — [One sentence description]
- [Page name]: [Full URL] — [One sentence description]

## About
[Founder name], [location], [founding year if relevant].
[Any notable press, awards, or trust signals worth including.]

llms.txt Template — SaaS Product

# [Product Name]

> [Product name] is a [category] platform that helps 
[target customer] [achieve outcome] by [key mechanism].

Built by [Founder/Company], based in [Location], founded [Year].

## What [Product Name] Does
- [Core feature 1 — described as an outcome, not a feature name]
- [Core feature 2]
- [Core feature 3]
- [Core feature 4]
- [Core feature 5]

## Who It's For
- [ICP 1]: [specific problem they have that this solves]
- [ICP 2]: [specific problem they have that this solves]
- [ICP 3]: [specific problem they have that this solves]

## Pricing
- Free: [What's included — no credit card required if applicable]
- [Paid tier 1]: $[X]/mo — [What's included, who it's for]
- [Paid tier 2]: $[X]/mo — [What's included, who it's for]
- Enterprise: [Contact / custom pricing] — [What's included]

## Key Pages
- Homepage: https://[yourdomain].com — Overview and free audit
- Pricing: https://[yourdomain].com/pricing — Full plan comparison
- How it works: https://[yourdomain].com/how-it-works
- Blog: https://[yourdomain].com/blog — [Topic] guides and resources
- About: https://[yourdomain].com/about — Company and team

## Integrations & Stack
[Optional: list key integrations, APIs, or tech if relevant 
to how buyers evaluate your product]

## About
[Founder(s)], [Company name if different from product], 
[Location]. [One sentence on mission or approach if it 
adds meaningful context.]

## Contact
- Support: [support email or URL]
- Sales/Enterprise: [contact email or URL]

How to Deploy Both Files

robots.txt

Create a plain text file named robots.txt
Place it at the root of your domain — yourdomain.com/robots.txt
Verify it's accessible by navigating to that URL in a browser
Test with Google Search Console's robots.txt tester

llms.txt

Create a plain text file named llms.txt using Markdown formatting
Place it at the root of your domain — yourdomain.com/llms.txt
Verify it's accessible by navigating to that URL
Ensure your robots.txt allows crawlers to access it (the templates above include this)

For Next.js users: Place both files in your /public directory. Next.js serves /public at the root, so /public/robots.txt is accessible at yourdomain.com/robots.txt.

For other frameworks: Place both files in whatever directory is served as your static root.

Verifying Your Setup

After deploying, run these checks:

robots.txt:

Navigate to yourdomain.com/robots.txt — confirm it loads
Use Google Search Console → Settings → robots.txt Tester
Check that GPTBot, PerplexityBot, and ClaudeBot are not blocked

llms.txt:

Navigate to yourdomain.com/llms.txt — confirm it loads and renders cleanly
Ask Perplexity: "What does [yourdomain].com do?" — within a few weeks of deploying, Perplexity should give a more accurate answer
Run a Voxrank audit — llms.txt presence is one of the 32 checked metrics

Frequently Asked Questions

Is llms.txt an official standard?

Not yet. As of 2025, llms.txt is a proposal with growing adoption but no formal W3C or IETF ratification. Perplexity has indicated support. Several major AI companies have acknowledged it. Adoption is growing fast enough that implementing it now provides real benefit, with no downside risk.

Do I need both files?

Yes. They serve different purposes and neither substitutes for the other. robots.txt without llms.txt = AI can access your site but has to figure out what it means on its own. llms.txt without a correct robots.txt = AI knows what your site contains but might be blocked from accessing it.

What if I'm on Webflow, Squarespace, or another hosted platform?

Check whether your platform supports custom file uploads at the root level. Webflow supports custom files. Squarespace has limited support — you may need to use a custom domain redirect or a DNS-level solution. WordPress users can add both files directly via FTP or a file manager plugin.

How quickly will AI engines respond to these changes?

robots.txt changes are respected almost immediately by active crawlers — within days. llms.txt takes longer to influence AI output because it depends on when the crawler next visits your site and how quickly that data feeds into the system. Perplexity typically responds within 1–3 weeks. Training data signals take longer. Run monthly query tests to track improvement.

Should I include my llms.txt in my sitemap?

You can reference llms.txt in your robots.txt using the Sitemap directive (as shown in the templates). This is optional but helps crawlers discover it. Do not include llms.txt in your XML sitemap — that's for HTML pages, not root files.

Published in The Answer — Voxrank's publication on brand discovery in the AI era. Check whether your robots.txt and llms.txt are correctly configured with a free audit at voxrank.ai.

Ready to measure your AI visibility?

Voxrank is launching soon.

Join the Waitlist →