llms.txt is a markdown file at the root of your site that tells AI engines how your site is structured and which pages are worth indexing. It was proposed by Jeremy Howard at Answer.AI in September 2024.

Do I need an llms.txt file?

It's not required by any AI engine, but it's becoming a soft standard and it's a near-zero-effort signal that your site is AI-aware. We recommend adding one to any site that wants to rank in ChatGPT, Perplexity, or Google AI Overviews.

What's the difference between llms.txt and robots.txt?

robots.txt tells crawlers which paths they can crawl. llms.txt tells AI engines what's worth indexing and how your site is structured. They serve different jobs: robots.txt is access control, llms.txt is curation.

How long should an llms.txt file be?

Most useful llms.txt files are 30–80 lines. Long enough to map your core pages, product surface, and resources; short enough that an LLM can read it in one pass. If you have hundreds of pages, structure it as an index, not as a content dump — that's what llms-full.txt is for.

What is llms.txt? The spec, the use case, and how to write one

llms.txt is the smallest piece of AI search optimization work you can do that's not zero, and it's worth doing on day one of any site that cares about being cited in ChatGPT, Perplexity, or Google AI Overviews.

It's a single markdown file at your site root. It takes 20 minutes to write. It doesn't move rankings on its own — but it's a clean signal of AI-readiness, it gives generative engines a map of what's worth indexing, and adoption across SaaS and content sites has been climbing through 2025 and into 2026.

This post is the working spec, plus the four-step recipe we use to write one, plus Sivon HQ's own llms.txt as a worked example.

The spec, in one paragraph

llms.txt was proposed by Jeremy Howard at Answer.AI in September 2024. The proposal: a markdown file at /llms.txt on the root of any site, structured as a brief site description plus a list of links to important pages, written in a format an LLM can read in a single pass. It's modelled loosely on robots.txt (which it doesn't replace) and on sitemap.xml (which it also doesn't replace). The format is markdown, the location is fixed, and the structure is open enough that "doing it right" is mostly about discipline rather than spec compliance.

The proposal site, llmstxt.org, is the canonical reference. It includes the format spec, examples from early adopters, and a directory of sites that have shipped one.

What problem llms.txt actually solves

Crawlers are good at discovering pages. They are not good at deciding which pages on your site matter. A homepage, a pricing page, a few core blog posts, a manifesto, and an about page are not equally valuable signals about who you are — but a generic crawler treats them as if they are.

llms.txt is the curation layer. You're saying: of the 200 pages on this site, here are the 40 that matter for understanding what we do, in roughly this order of importance, with one-sentence context per page. An AI engine reading the file can build a much sharper internal model of your business than it could from crawling unaided.

The mental model is closer to a README than to a sitemap. A sitemap.xml is exhaustive ("here's every URL"). A robots.txt is access control ("here's where you can go"). An llms.txt is editorial ("here's what's worth reading and what each page is for"). The three coexist; they don't substitute.

llms.txt vs robots.txt vs sitemap.xml

A short comparison, since this is the question every team asks first:

Feature	`robots.txt`	`sitemap.xml`	`llms.txt`
Purpose	Access control	URL discovery	Site curation for LLMs
Format	Plain text directives	XML	Markdown
Audience	Any web crawler	Search engines	AI engines / LLMs
Required by spec	RFC 9309 (de facto standard)	sitemaps.org	Proposed standard, no enforcement
Should you ship one	Yes, every site	Yes, every site	Yes, if you care about AI search

You ship all three. They're complementary, not competing. The most common mistake is to think llms.txt is a replacement for sitemap.xml — it isn't. AI engines that find your site through traditional crawling still benefit from a complete sitemap.xml. llms.txt is the additional curation signal on top.

Why it matters in 2026

Three reasons it's worth doing now, even though no engine technically requires it.

1. AI engines are reading the open web differently. GPTBot, ClaudeBot, PerplexityBot, and Google-Extended crawl your site for different reasons than Googlebot does. They're not building a search index — they're building a retrieval corpus. A curated llms.txt is a higher-density signal than letting them infer your site structure from internal links and meta descriptions. We've watched fresh sites get pulled into Perplexity's citation pool within days of shipping llms.txt, where comparable sites without one took weeks.

2. The format is converging on a soft standard. When a few hundred high-authority sites ship a file with the same structure, the file becomes a recognisable pattern that engines specifically look for. We're past the early-experiment phase — adoption is wide enough that engines have reasons to weight the file's presence. It doesn't single-handedly rank you, but it's the kind of signal where being absent looks worse than being present.

3. It's a forcing function for editorial clarity. Writing your llms.txt requires you to answer: what is the one-paragraph version of this product? What are the 6–10 pages that genuinely matter? If you can't write the file in 20 minutes, your site's information architecture is too sprawling. The exercise itself surfaces problems that would otherwise stay hidden.

How to write one in four steps

Skip the spec for a minute. The four moves we make every time:

Step 1: write a one-paragraph product summary

Start with # [Brand Name] as the H1, then write a single paragraph that answers: what is this product, who is it for, what's the differentiator. No marketing fluff. The model is going to read this paragraph and use it to disambiguate you from every similarly-named tool in its training data, so it has to be specific.

A bad version: "Sivon HQ is an AI-powered platform that helps businesses with marketing."

A better version: "Sivon HQ is an AI marketing team that knows your business. Set up your Brand Blueprint once — get a diagnosis, a ranked fix list, and assets across content, social, ads, and outreach that sound like you, not like AI."

The second version disambiguates with concrete capabilities, names the artefact (Brand Blueprint), and signals the audience. The first could describe 200 tools.

Step 2: list the core pages

Five to ten links, with one sentence per link describing what the page is for. Order them by importance to a buyer trying to understand the product, not by order in your nav.

## Core pages
 
- [Home](https://sivonhq.com/): Product overview, positioning, and the four engines.
- [Pricing](https://sivonhq.com/pricing): Free, Starter, Pro, Agency tiers.
- [About](https://sivonhq.com/about): Why Sivon HQ exists, how it works, who it's for.
- [Manifesto](https://sivonhq.com/manifesto): What we believe about marketing and AI.

The link text is the page name. The colon-separated suffix is the context. Both matter. The model reads both.

Step 3: list the product surface

A separate section that maps your features or product modules. This is where you describe what the tool actually does, with one line per capability. For SaaS sites, this section is the densest signal — when a buyer asks ChatGPT "what does Sivon HQ do," the answer is mostly synthesised from this section.

Step 4: list resources

Blog, changelog, docs, podcasts, anything else that's evergreen and useful for understanding the brand. Keep it tight. The temptation is to list every blog post; resist. List the index pages and let the engine crawl from there.

End the file. Don't overthink it. The whole file should be 30–80 lines for most sites.

Sivon HQ's llms.txt as a worked example

Our llms.txt is roughly 40 lines. Here's the structure:

# Sivon HQ
 
> An AI marketing team that knows your business. Set up your Brand Blueprint
> once — get a diagnosis, a ranked fix list, and assets across content, social,
> ads, and outreach that sound like you, not like AI.
 
Sivon HQ is built for small marketing teams (2–10 people), in-house marketers
at SMBs, and small agencies — operators who are tired of re-explaining their
product, audience, and tone to ChatGPT every time. The differentiator is
**persistent business context**: every output reuses the same Brand Blueprint,
so nothing reads as generic AI.
 
The product is organised into three layers: **Diagnosis** (score what's wrong),
**Engines** (fix it with one context-aware brain across every channel), and
**Assets** (a searchable library of everything generated).
 
## Core pages
- [Home](https://sivonhq.com/): Product overview, positioning, and the four engines.
- [Pricing](https://sivonhq.com/pricing): Free, Starter, Pro, Agency tiers. 7-day Pro trial.
- [About](https://sivonhq.com/about): Why Sivon HQ exists, how it works, who it's for.
- [Manifesto](https://sivonhq.com/manifesto): What we believe about marketing and AI context.
 
## Product
- **Brand Blueprint** — captures product, audience, voice, positioning, competitive context.
- **Diagnosis** — multi-lens audit producing a scored gap list.
- **Content Studio** — long-form blog and SEO content grounded in the Blueprint.
- **Social Composer** — post-by-post social generation in your brand voice.
- **Ad Workbench** — paid ad copy and creative for Google, Meta, LinkedIn.
- **Outreach Sequencer** — cold outbound and reply-handling sequences.

A few notes on design choices:

The blockquote at the top is the spec-suggested format for the one-paragraph summary. Models read it as a tagline, which is how you want them to use it.
The Core pages section uses the canonical capitalisation and the actual page titles. Don't restructure your nav for this — mirror it.
The Product section uses bold for the feature name and dash-separated description. We've found this format extracts more cleanly than colon-separated.
No blog posts listed individually. The file points to the /blog index. If we listed every post, the file would be 200 lines and the high-signal pages would get diluted.

Common mistakes

A short list, all of which we've seen on real sites:

Listing every URL. llms.txt is editorial, not exhaustive. If it has more than 80 lines, you're confusing the model and diluting the signal of your most important pages. Trim aggressively.

Marketing copy in the one-paragraph summary. "Revolutionary AI-powered platform" is invisible to a model. Write the way you'd brief a new hire on day one: what is this, who's it for, what makes it different.

Forgetting the link descriptions. A bare list of URLs is much less useful than a list with one-sentence context per link. The colon-separated suffix is doing real work.

Leaving stale URLs. If you redesign the site, update llms.txt the same day. AI engines cache aggressively, but they also re-fetch — a 404 on a link in your llms.txt is a credibility hit.

Linking to gated or auth-required pages. The point of llms.txt is curation of publicly readable content. Don't list /dashboard or /admin even if they're product surfaces.

Inconsistent brand spelling. Pick one canonical name ("Sivon HQ", not "Sivon" or "SivonHQ") and use it everywhere in the file. Entity confidence is a signal AI engines weight, and llms.txt is one of the highest-trust references the model has for how to spell your brand.

When to add llms-full.txt

llms-full.txt is the long-form variant — a single markdown file with every important page on your site concatenated, intended for retrieval ingestion rather than navigation. It's overkill for most sites. We recommend it only when:

You have stable, high-quality content you want cited verbatim.
Your site has fewer than ~200 pages so the file stays manageable.
You've already shipped llms.txt and the seven foundational fixes covered in the AI search optimization guide.

Above 200 pages, a structured llms.txt index is a better investment than a giant llms-full.txt. Below that, the long-form file is a reasonable third tier of effort. Treat it as optional, not as the next logical step.

What to do next

Three paths from here:

If you haven't shipped llms.txt yet, write the four-step version above. Twenty minutes. Ship it.
If you have one but it's stale, audit the links, update the product section, and add a "Last updated" line at the top in markdown comments.
If you want the rest of the AI search optimization stack, the pillar guide covers the seven fixes that compound: schema depth, entity hygiene, citation-worthy passages, brand mentions, freshness, and the rest. Or skip ahead to generative engine optimization for the academic framework, or how to rank in ChatGPT for the engine-specific playbook.

If you want this audit run for you across all four AI search surfaces — llms.txt plus the rest of the stack — that's the AI Visibility engine in Sivon HQ. It scores your site against the seven highest-impact fixes and outputs a ranked work list.

Twenty minutes for llms.txt is the most overdue chore on most marketing sites' to-do list. The cost of doing it is rounding error. The cost of not doing it is six months of citation pickup that wasn't going to happen anyway.