What Is llms.txt? The New Standard for AI Crawlers Explained
What Is llms.txt? The New Standard for AI Crawlers Explained
Every few years, a new file format shows up that promises to fix something fundamental about how the web talks to machines. Most disappear within months. A few, robots.txt, sitemap.xml, ads.txt, quietly become indispensable. llms.txt is the latest candidate. Unlike most AI-era proposals, it has a clean specification, a working website at llmstxt.org, and a small but real adoption curve. It also has a complicated reality: the spec exists, but the major AI crawlers aren't actually using it yet.
Here's what llms.txt is, how it's structured, what problem it's trying to solve, and what its current status actually means for your site.
The problem llms.txt is trying to fix
Large language models have a hard architectural constraint: limited context windows. They can't read entire websites the way a human can browse over time. When an AI engine wants to understand what your site is about, or pull information from your documentation, it has to work through a small window that can't hold every page at once.
The official llms.txt proposal frames it directly. Per the spec on llmstxt.org, the file serves as "a /llms.txt markdown file to provide LLM-friendly content" that includes background information, guidance, and links to detailed resources. Instead of converting complex HTML to plain text and hoping the AI parses it correctly, llms.txt offers a curated, concise alternative specifically built for AI consumption.
Think of it as a sitemap-for-AI, not a robots.txt-for-AI. It's not about blocking crawlers. It's about giving them a guided tour of your most important content, in a format they can actually read efficiently.
The format is markdown, clean, simple, structured
The specification prescribes a precise markdown structure with required and optional components, in this order:
- H1 heading (required), the project or site name
- Blockquote (optional), a brief summary with essential context
- Body text (optional), additional project details and interpretation guidance
- H2-delimited sections (optional), file lists with curated URLs
Each file list is a markdown list containing a required markdown hyperlink [name](url), optionally followed by a colon and a short note about the file.
Here's what a minimal llms.txt looks like in practice:
# Acme Documentation
> Acme is a workspace tool for engineering teams. This file points
> AI systems toward our most authoritative documentation.
## Getting Started
- [Quickstart guide](https://acme.com/docs/quickstart): Set up your first project in 5 minutes
- [Installation](https://acme.com/docs/install): Installing the Acme SDK in your stack
## API Reference
- [Authentication](https://acme.com/docs/api/auth): How to authenticate API requests
- [Endpoints](https://acme.com/docs/api/endpoints): Full API endpoint reference
## Optional
- [Changelog](https://acme.com/docs/changelog): Historical release notes
- [Migration guides](https://acme.com/docs/migrations): For users coming from older versions
It's human-readable and machine-parseable at the same time. Markdown is exactly the format LLMs were trained most heavily on, which is kind of the whole point. You're giving the AI a document in its native language.
The "Optional" section has a special meaning
One detail the spec buries that's easy to miss: the section literally titled "Optional" carries special semantic weight. The spec says "if it's included, the URLs provided there can be skipped if a shorter context is needed." Translation: tell the AI these are useful if there's room, but it's fine to skip them when context is tight.
This is a nice touch. It acknowledges that LLMs make tradeoffs about what they can read, and it gives you a way to signal which parts of your documentation are essential vs. nice-to-have. That's a small but genuinely useful signal.
The .md companion convention
The proposal also recommends a complementary standard: websites should provide markdown versions of their pages by appending .md to the URL. So /docs/quickstart becomes /docs/quickstart.md for AI consumption. This lets LLMs fetch individual documentation pages in clean markdown form, without having to parse the HTML wrapper around them.
Some doc platforms (notably GitBook) generate these .md companions automatically. For most sites, it's a build-time task: render each page to both HTML and markdown, serve both at parallel URLs, and let the AI engine pick which to fetch.
How llms.txt differs from robots.txt
This is the most common confusion, and the spec is explicit about it. llms.txt is not an access-control file. It doesn't tell crawlers what they can or can't access. It doesn't replace robots.txt. The two files do completely different jobs:
- robots.txt, controls which crawlers can access which URLs (allow/deny)
- llms.txt, recommends to AI systems which content is most authoritative and should be prioritized
You can, and probably should, have both. They don't conflict; they answer different questions. robots.txt answers "are you allowed to crawl this?" while llms.txt answers "if you do crawl this site, where should you start?"
The current adoption reality
Here's the part most llms.txt advocates leave out. As of mid-2025, adoption was minimal. Semrush reported only 951 domains, a tiny fraction of the web, had published an llms.txt file. More importantly, Semrush's testing found that the major AI crawlers, Google-Extended, GPTBot, PerplexityBot, and ClaudeBot, weren't actually accessing the llms.txt files on those domains.
The honest summary: llms.txt is "currently just a proposed standard rather than something that's actually being used by the major AI companies." That matters. The format exists. The spec is clean. Some sites have implemented it. The big crawlers aren't acting on it yet.
Should you implement it anyway?
The case for implementing llms.txt despite low adoption is basically "low cost, asymmetric upside":
- Generating an llms.txt file takes 30 minutes for most sites and is fully automatable.
- It costs nothing to host (it's a single small markdown file at /llms.txt).
- If adoption picks up, you're already prepared.
- The exercise of writing one forces you to think about which parts of your site are actually authoritative and how they relate to each other, useful even if no AI ever reads the file.
- Some smaller AI tools, agents, and SDKs already do read llms.txt, even if the major crawlers don't yet.
The case against: don't expect anything to happen. Implementing llms.txt today probably won't move the needle on your AI visibility in ChatGPT, Gemini, or Perplexity, because those engines aren't reading it. If you're choosing between llms.txt and any other GEO investment, the others have higher impact right now.
The pragmatic recommendation
Treat llms.txt as low-priority maintenance work, not a primary GEO investment. If your documentation platform auto-generates it (like GitBook), let it run. If not, generate one when you have a slow week and host it. Update it when you ship major new docs.
Don't expect immediate citation lift. Don't market it internally as a major GEO win. Don't let it crowd out higher-impact work like improving content structure, building authority, or refreshing old pages. If the major AI crawlers start adopting llms.txt seriously, which is plausible but not guaranteed, you'll already be ready. If they don't, you've spent a small amount of effort on clean documentation hygiene.
llms.txt is a real proposal with a real spec and real potential. It's just not yet a real market force. Implement it accordingly.
Ready to decide whether it's right for your site? See Should You Use llms.txt? An Honest Pros and Cons Analysis.