AI Marketing·Nishil Bhave··21 min read

AI search optimization: the 2026 guide to getting cited by ChatGPT, Perplexity, and Google AI Overviews

AI search optimization is how you get cited by ChatGPT, Perplexity, and Google's AI Overviews. The full 2026 playbook — surfaces, signals, and a 7-step audit.

Nishil Bhave
Nishil BhaveFounder, Sivon HQ

There's a question worth asking before you write another blog post: when a buyer asks ChatGPT "what's the best alternative to Jasper", does your site come up?

If you don't know, the honest answer is probably no. Most sites are organised for the old internet — the one where Google sent ten blue links and your job was to be one of them. The new internet sends a paragraph with three citations, and the citations are the traffic.

This guide is the working playbook we use at Sivon HQ to optimize a site for citations across ChatGPT, Perplexity, Google AI Overviews, and Claude. It's the same framework our AI Visibility engine runs as an audit. It's long on purpose — the cluster of supporting posts that go deeper on each surface is linked at the end.

Key takeaways

  • Generative engines have changed the unit of search traffic from "rank in ten blue links" to "be cited in the generated answer." The optimization stack overlaps with SEO but is not identical.
  • Four surfaces matter in 2026: ChatGPT search, Perplexity, Google AI Overviews, and Claude. Each has different citation rules. Optimizing for one without the others leaves traffic on the table.
  • The seven highest-impact fixes — in rough order of effort vs. payoff — are: an llms.txt file, deeper schema markup, entity hygiene, citation-worthy passages, brand mentions across the open web, freshness signals, and an llms-full.txt for retrieval.
  • You can measure most of this manually with branded prompts and a spreadsheet. Tooling exists; none of it is yet a "Google Search Console for AI search."

Table of contents

  1. Where AI search optimization fits
  2. The four AI search surfaces in 2026
  3. How AI engines decide who gets cited
  4. The seven things to fix
  5. How to measure AI search visibility
  6. Worked example: auditing sivonhq.com
  7. Where to go next

Where AI search optimization fits

Three terms get used interchangeably and shouldn't be: AI search optimization, generative engine optimization (GEO), and LLM SEO. We use AI search optimization as the umbrella, because most operators don't say "generative engine" out loud — they say "I want to show up in ChatGPT."

Traditional SEO competes for the blue-link slot. The unit of value is a ranking. The optimization stack — keywords, intent, on-page, internal links, backlinks, technical health — has been settled for two decades.

AI search optimization competes for the citation inside a generated answer. The unit of value isn't a position; it's whether your URL ends up in the model's cited sources panel. Ranking concepts still apply (authority, relevance, freshness, structured data), but several new signals matter that didn't matter before:

  • Passage-level extractability. Generative engines lift sentences, not pages. A page that ranks #1 on Google but reads as 1,800 unbroken words of marketing prose is harder to cite than a #6 page with crisp Q&A blocks.
  • Entity clarity. When the model is deciding whether "Sivon HQ" and "sivonhq.com" are the same thing, every signal — Organization schema, sameAs links, consistent brand mentions across the open web — changes the answer.
  • Citation worthiness. Concrete numbers, named sources, primary research, and quotes from named people get cited at higher rates than generic prose.

This is the territory the generative engine optimization paper from Aggarwal et al. (Princeton, IIT Delhi, Georgia Tech, Allen Institute, 2023) staked out academically — they tested nine on-page interventions and found that adding citations, statistics, and quotations were the three highest-lift methods for getting picked up by generative search. The numbers in that paper are early and the engines have moved since, but the directional finding has held up across every audit we've run.

The biggest mental shift: you're no longer optimizing for a search engine. You're optimizing to be a source the model trusts. That's a different job.

A few terminology notes, since the field is still settling. GEO (generative engine optimization) is the academic term-of-art, used in the original Princeton paper and increasingly in industry conferences. LLM SEO and answer engine optimization (AEO) are practitioner terms that float around marketing Twitter and LinkedIn — they mean roughly the same thing as GEO. AI search optimization is the term most marketing teams use when they actually have to brief work; it survives the "explain this to your CEO" test better than the others. We use AI search optimization in client conversations and GEO in technical writing. None of them are wrong; pick one and be consistent so your team isn't relitigating vocabulary every meeting.

What's the same as traditional SEO: the core authority signals (domain reputation, link equity, indexability), structured data, semantic relevance, page experience, and content quality bar. What's different: the citation-vs-ranking shift, the weight on passage-level extractability, the entity-clarity demand, and a fleet of AI-specific crawlers that need to be managed alongside Googlebot and Bingbot. Sites that already do good SEO have a 60–70% head start on AI search optimization. The remaining 30% is the work this guide describes.

The four AI search surfaces in 2026

Four engines own the search experience that touches buyers. They behave differently. The fastest way to waste an afternoon is optimizing for "AI search" as if it were one thing.

ChatGPT search. OpenAI launched ChatGPT search on October 31, 2024 and rolled it broadly through Q4. ChatGPT search blends OpenAI's own retrieval index with web results, surfaces inline links, and shows a citations panel for most factual queries. Crawled by GPTBot, OAI-SearchGPT, and ChatGPT-User (see OpenAI's bot docs). ChatGPT picks citations conservatively — three to six sources for most answers, weighted toward established domains and content with high passage clarity. Deep dive: ranking in ChatGPT.

Perplexity. A retrieval-first engine — its product is the citation. Perplexity surfaces 5–10 sources per answer with explicit citation numbers and a sources panel that gets clicked at meaningful rates. Crawled by PerplexityBot. Perplexity weights authority and recency aggressively, but a brand-new site with an indexable answer to a niche query can show up the same week it ships. Deep dive: Perplexity citation criteria.

Google AI Overviews. Google's generative answer block, introduced at I/O in May 2024 and now appearing on a meaningful fraction of US search queries. AI Overviews are not a new index — they pull from the same Google index that ranks blue links — but the citation set is narrower (3–5 sources) and the trigger conditions are different. Informational, comparison, and "best of" queries trigger Overviews more often than transactional queries. Deep dive: winning the AI Overview slot.

Claude. Anthropic's Claude added native citations and web search through 2025. Claude is more conservative about citing the open web than ChatGPT but more rigorous about not hallucinating attribution — when it cites you, the citation usually goes to a specific passage, not a vague summary. Crawled by ClaudeBot and anthropic-ai. Most Claude usage today is via the API and Claude Desktop; consumer search volume is smaller than ChatGPT but growing.

The four surfaces share a backbone — they all benefit from clean structured data, recent dates, and clear entity signals — but the citation mechanics diverge. The pillar of any AI search strategy is: be cite-able everywhere; don't pick a favourite engine. The work to be cite-able for Perplexity is mostly the same work that makes you cite-able for ChatGPT, with a few engine-specific levers covered in the deep-dive posts.

A short comparison of how the four surfaces differ in practice:

SurfaceCitations per answerRecency weightAuthority weightBrand-new-site friendliness
ChatGPT search3–6MediumHighMedium — needs entity confidence
Perplexity5–10HighMediumHigh — newest, easiest to crack
Google AI Overviews3–5MediumVery highLow — Google's authority bias persists
Claude2–4MediumHighMedium — small consumer footprint

The table is a generalisation — every audit has surprises — but the pattern holds. If you have a 6-month-old domain and want fastest pickup, optimize first for Perplexity, then for ChatGPT. If you have an authority-heavy 5-year-old domain that already ranks on Google, you'll see AI Overview pickup faster than the rest. There's no universal "do this first."

One thing the table doesn't show: the engines update fast. ChatGPT search shipped in Q4 2024 and added at least three citation-mechanic changes in the first year. Whatever ranking heuristics you internalise this quarter will be partly wrong next quarter. The durable bet is on the underlying signal categories — entity clarity, passage extractability, structured data — not on the surface-specific tactics.

How AI engines decide who gets cited

Every engine guards the exact mechanics, but the consensus signals are well-documented enough to act on. Six factors determine whether your URL shows up in the citations panel.

1. Indexability. This sounds boring and it is. The crawler has to reach your page. AI crawlers are a different fleet than Googlebot — GPTBot, ClaudeBot, PerplexityBot, Google-Extended (the AI-specific Google bot). Many sites still have these blocked in robots.txt from a panicked summer-2024 decision, and they've never unblocked them. If a page isn't crawled, it cannot be cited. We start every audit by listing the AI bots and confirming each is allowed.

2. Entity confidence. AI engines need to be sure your brand is one consistent thing. They get that confidence from: a clean Organization schema block on every page, a sameAs array linking your site to LinkedIn, Twitter, Crunchbase, Wikidata, and other identity sources, consistent brand spelling and capitalisation across the open web, and Wikipedia or Wikidata presence at higher tiers. Brand-new sites can build entity confidence in months by being prolific on a few well-indexed surfaces (your own blog, your founder's LinkedIn, GitHub READMEs, podcast episodes that get transcribed).

3. Passage extractability. Models lift sentences, not pages. A passage that answers a question in one or two sentences, with a concrete number or named source, gets cited at a much higher rate than the same answer buried in five paragraphs. The simplest fix: write a one-sentence answer to your H2 question, then expand. Don't make the model dig.

4. Structured data depth. FAQPage, HowTo, Article, Product, Organization, BreadcrumbList. Every schema block you add gives the engine more confidence about what the page is about and how to extract it. We've seen pages with the right FAQPage schema get cited verbatim — the question and answer pulled directly from the schema's mainEntity array.

5. Brand mentions across the open web. Engines read forums, podcast transcripts, GitHub READMEs, Reddit threads, and other open-web content as part of training and retrieval. A site with strong on-page signals but zero brand mentions on Reddit or podcasts looks thinner than a site with the same pages plus 50 mentions across well-indexed sources. This is closer to traditional digital PR than to SEO, but it's now part of the AI search optimization job.

6. Freshness. All four engines weight recency. A page with datePublished in 2022 and no dateModified reads as stale. The same page with dateModified: 2026-05-01 and a few new paragraphs reads as maintained. Refresh dates aren't a hack — they're a signal that the content is current. We update our top 10% of pages quarterly minimum, with a real edit, not just a date bump.

These six factors aren't equally weighted. The order above is roughly our experience of impact: indexability and entity confidence are foundational; passage extractability and structured data are the highest-effort-but-highest-payoff levers; brand mentions and freshness compound over time.

The seven things to fix

Ordered by effort-to-impact, in rough priority order. Items 1–3 are most sites' biggest gaps. Items 4–7 are how the top 10% pull away.

1. Add an llms.txt file

The lowest-effort, highest-symbolic-value fix. llms.txt, proposed by Jeremy Howard at Answer.AI in September 2024, is a markdown file at your site root that tells AI engines what's worth indexing and how your site is structured. It's not yet a hard standard the way robots.txt is, but adoption is climbing across SaaS and content sites, and it's a clear signal that your site is AI-aware.

A minimum viable llms.txt contains a one-paragraph site description, a list of core pages, a list of product or feature pages, and a list of resources. Sivon HQ's own llms.txt lives at sivonhq.com/llms.txt and is roughly 40 lines.

Deep dive: the llms.txt spec, examples, and common mistakes.

2. Deepen your structured data

Most sites have one or two schema blocks (Article on blog posts, maybe Organization on the home page). The AI-search-ready stack is wider: Organization on every page, WebPage on every page, BreadcrumbList on every internal page, Article on blog posts, FAQPage on any page that has questions, HowTo on procedural posts, SoftwareApplication or Product on landing pages, and Person schema for any author bylines.

The single highest-leverage addition for most sites is FAQPage. Models read the mainEntity array and pull questions and answers directly. We've watched pages get cited verbatim from their FAQ schema in Perplexity. A page with five FAQ questions in schema is effectively five citation surfaces, not one.

Don't fake it. Schema that doesn't match visible content gets caught by Google and the AI engines both. Write the FAQs visibly on the page, then mirror them into schema.

3. Clean up entity signals

Pick the one canonical name for your company and use it everywhere. "Sivon HQ" — not "Sivon", not "sivonhq", not "Sivon Inc". Add a sameAs array to your Organization schema with your LinkedIn, Twitter, Crunchbase, GitHub, and Wikidata URLs. If you don't have a Wikidata entry, create one (it takes 20 minutes, costs nothing, and is one of the strongest entity signals available — Wikidata feeds Wikipedia, Google's Knowledge Graph, and several open AI training pipelines).

Then audit the open web. Search for your brand name and look at how it's spelled in the top 20 results. If half the references say "Sivon" and half say "Sivon HQ", the engine has a disambiguation problem. Reach out to the high-authority misspellings and request a fix. This is unglamorous work and it moves the needle.

4. Rewrite for citation-worthiness

Look at any page on your site and ask: "if a model needed to cite a single sentence from this page, which one?" If the answer is "I don't know" or "the whole opening paragraph", the page isn't built for citation.

The fix is structural:

  • Lead with a one-sentence answer to the H2 question, then expand.
  • Use concrete numbers, dollar figures, dates, and named sources wherever possible. "Aggarwal et al. (2023) found that citation, statistics, and quotation interventions produced the largest visibility lifts" cites; "research suggests" doesn't.
  • Quote real people by name. A quote from "Jane Smith, Director of Marketing at Acme" is more cite-able than the same words as your own prose.
  • Break long passages with H3 subheadings. Models cite within a heading scope.

The GEO paper tested these interventions empirically and reported double-digit visibility lifts for the top three. The methodology is academic and the engines have moved since the paper was published, but the directional pattern holds in every audit we've run.

Deep dive: the six GEO levers and how to apply them.

5. Earn brand mentions across the open web

Generative engines read forums, podcast transcripts, GitHub READMEs, Hacker News threads, Reddit, dev.to, Hashnode, and a long tail of other open-web sources. A site that exists only on its own domain looks thinner to a model than the same site plus 50 mentions across well-indexed corners of the internet.

The work overlaps with traditional digital PR but the targets are different. For AI citation lift, prioritise:

  • Subreddits relevant to your buyer. Helpful comments that mention your tool by name when genuinely relevant (not spam) get indexed and cited.
  • Podcast guest appearances that get transcribed and posted publicly. Transcripts are dense entity signal.
  • Open-source GitHub repositories with READMEs that mention your tool. Strong citation signal because GitHub is heavily indexed.
  • dev.to and Hashnode syndications of your blog posts, with canonical URLs pointing back to your site.
  • Hacker News Show HN posts for product launches.

What we don't recommend: paying for placements on low-authority "AI tool directory" sites. Engines have learned to discount these.

6. Tighten freshness signals

Three changes:

  • Add dateModified to every page that's been updated in the last 12 months. Many sites have datePublished only — the engine assumes the page is as old as the publish date.
  • Refresh your top 10% of pages on a quarterly cadence. A real edit — new section, updated stat, removed stale claim — not just a date bump. Engines (and Google) flag bumped dates with no content change.
  • Add a "Last updated" line in visible content near the top of long-form posts. This is a UX signal as well as a freshness one — buyers reading a 2024 post in 2026 want to know it's been maintained.

7. Consider an llms-full.txt

The long-form variant of llms.txt, intended for retrieval rather than navigation. Where llms.txt is a sitemap-style index, llms-full.txt (sometimes published as llms.txt with full content inline) is a single markdown file with every page on your site concatenated, structured for retrieval ingestion.

This is overkill for most sites. Add it only when (a) you have stable, valuable content that you want cited verbatim, (b) your site has fewer than ~200 pages, and (c) you've already done items 1–6. Below 200 pages, the file stays manageable. Above that, a structured llms.txt index is a better investment.

Deep dive on the spec, structure, and worked examples: what is llms.txt.

How to measure AI search visibility

Honest section: there's no Google Search Console for AI search. Yet. The measurement stack today is a mix of manual audits, branded prompts, and a few early tools, and you should be sceptical of any tool claiming a single "AI visibility score."

What we actually use:

Manual branded prompts. Once a week, run 10–15 prompts in ChatGPT, Perplexity, and Google AI Overviews that a buyer might actually type. Mix branded ("what is Sivon HQ"), comparison ("alternatives to Jasper"), and informational ("how to get cited by Perplexity"). Note for each: did your domain show up in citations, in the answer text, or not at all. A simple spreadsheet is enough for the first 90 days.

ChatGPT's "Browse" / SearchGPT panel. When ChatGPT does retrieval, it shows the sources it consulted. Click through to see which queries pull your site and which don't. This is the closest thing to an inverted index for ChatGPT.

Perplexity's citation panel. Every Perplexity answer shows numbered citations. Run the same prompt twice — citation order isn't deterministic — but a domain that consistently appears in the top 3 citations for your target queries is winning the engine.

Google AI Overview impressions in GSC. Google has begun separating AI Overview impressions in Search Console for some accounts. If you have it, the data is sparse but real. If you don't yet, manual SERP checks for your target queries fill the gap.

Brand-mention monitoring. Tools that crawl Reddit, podcast transcripts, and the open web for brand mentions help you see the off-domain entity signals that engines are reading. We use simple Google Alerts plus a weekly Reddit check; lots of paid tools exist if you need more.

Dedicated AI visibility tools. A small but growing category — Profound, Goodie AI, Otterly, and others — claim to track AI engine citations more systematically. We use them as a directional signal, not a system of record. The methodology is still maturing.

The honest answer: if you're auditing your AI search visibility for the first time, start with a spreadsheet of 20 prompts run weekly across the four engines. That's 80 data points a week and tells you more than any tool will in the first 90 days.

A concrete example of how to set this up in 30 minutes:

  1. Pick 20 prompts. Five branded ("what is [brand]"), five comparison ("[brand] vs [competitor]"), five informational ("how to [thing your buyer searches]"), five category ("best [thing] for [ICP]"). The mix matters — branded prompts test entity confidence, comparison prompts test how you stack against competitors, informational prompts test cluster authority, category prompts test whether you're considered a member of the category at all.
  2. Set up the tracker. A spreadsheet with rows = prompts, columns = engines × week. Per cell: cite (your domain is in the citations panel), mention (your brand is in the answer text but not cited), none, or competitor (a competitor is cited where you should be).
  3. Run the audit weekly. Same time, same prompts. Don't change the prompt list for at least 90 days — drift in the prompts hides drift in your visibility.
  4. Score the trend, not the absolute. Week 1 numbers are baseline. The signal is whether cite rates climb over weeks 4–12 as the cluster matures. If they don't, something in items 1–7 above isn't landing.

This is roughly the same thing the AI Visibility engine in Sivon HQ runs as a managed service — same prompt taxonomy, automated weekly cadence, and a delta report against the prior week. The DIY version works fine for a 90-day pilot. The managed version is for when you don't want to spend an hour a week running prompts manually.

Our take: the difference between sites that rank in AI search and sites that don't isn't tooling. It's whether they treat the audit as a real recurring practice or a one-time cleanup. Engines change monthly. The on-page work is durable; the measurement loop is what keeps you ahead of drift.

If you want this audit run for you with a ranked fix list, that's exactly what Sivon HQ's AI Visibility engine does — it scores your site against the seven fixes above and outputs the ones with the highest expected lift.

Worked example: auditing sivonhq.com

Eat your own dog food. Here's the AI search audit we ran on sivonhq.com after launch in May 2026, plus what we changed.

Where we started. New domain. Zero authority. Schema on the home page (Organization, SoftwareApplication, FAQPage) but nothing on internal pages. No llms.txt. Cover images and OG tags on most pages but inconsistent. Brand mentions: zero. Indexability: clean — robots.txt allowed all four AI bots from day one, which most sites still get wrong.

What we fixed in the first week.

  1. Added llms.txt at sivonhq.com/llms.txt. Forty lines, structured with a one-paragraph product summary, core pages, product features, and resources. Took 30 minutes.
  2. Deepened schema on every page. Organization and WebPage everywhere, BreadcrumbList on every internal page, Article on blog posts, FAQPage on home/pricing/feature pages, Product on /pricing with offers for monthly and annual tiers, SoftwareApplication site-wide. Six file changes, mostly factoring schema builders into a shared library.
  3. Cleaned the brand entity. "Sivon HQ" everywhere, never "Sivon" or "SivonHQ". Added sameAs array on Organization schema with our Twitter and LinkedIn URLs. Created a Wikidata entry (still a stub, but indexed).
  4. Rewrote the home page for citation-worthiness. One-sentence answers to each section's H2 question, concrete numbers wherever defensible, named sources with inline citations.
  5. Built the alternatives cluster — five competitor comparison pages with SoftwareApplication, FAQPage, and BreadcrumbList schema each. Within two weeks of launch we'd been crawled by GPTBot, ClaudeBot, and PerplexityBot (visible in our access logs).

What we're tracking. A weekly spreadsheet of 25 prompts across the four engines. Branded prompts ("what does Sivon HQ do"), comparison prompts ("Sivon HQ vs Jasper"), and category prompts ("AI marketing tools for small teams"). Citation count, answer text inclusion, and source order. The first 30 days are mostly zeros — that's expected for a 2-week-old domain. We'll publish the 90-day numbers when we have them.

What we haven't fixed yet. llms-full.txt (under 200 pages, but waiting until cluster ships fully). Brand mentions across the open web (Phase 7 in our SEO playbook — manual outreach, not code). Refresh cadence on older posts (we don't have older posts; check back in Q4).

The point of this section isn't to claim Sivon HQ is a model AI-search-optimized site. We're 14 days old. The point is the audit framework runs the same on a 14-day-old site as on a 14-year-old one, and the fixes are independent enough to be done in any order.

Where to go next

The cluster:

  • What is llms.txt? — The spec, why it matters, how to write one, and Sivon HQ's own llms.txt as a worked example.
  • How to rank in ChatGPT — The SearchGPT pipeline, citation factors, and a 5-step audit checklist for ChatGPT specifically.
  • The Google AI Overviews guide — What triggers an Overview, how to be the source Google cites, and the cannibalization debate.
  • Generative engine optimization — The Princeton GEO paper, the six levers it identified, and how to apply them in 2026.
  • The Perplexity SEO guide — How Perplexity's retrieval pipeline ranks sources and what optimization tactics actually move the citation panel.

If you want this audit run for you across all four surfaces with a ranked fix list and a weekly cadence — instead of running prompts manually in a spreadsheet — that's the AI Visibility engine inside Sivon HQ. It's free to run on one domain.

The work compounds. AI search optimization in 2026 looks roughly like SEO in 2010 — early, the rules are not fully written, and the operators who do the work this year are the ones who get cited next year.