AI Marketing·Nishil Bhave··9 min read

Generative engine optimization (GEO): the framework, the levers, and how to apply them in 2026

Generative engine optimization is the academic term for what marketers call AI search optimization. The Princeton paper, the six levers it identified, and how to apply them now.

Nishil Bhave
Nishil BhaveFounder, Sivon HQ

Generative engine optimization (GEO) is the academic name for the work most marketing teams call AI search optimization. The term came out of a 2023 paper by Aggarwal et al. — Princeton, IIT Delhi, Georgia Tech, and the Allen Institute for AI — that ran the first systematic empirical study on what makes content cite-able by generative engines.

The paper isn't required reading to do the work, but it's the cleanest theoretical foundation we have for AI search optimization, and it's worth understanding what they tested and what they found. Several of the field's strongest priors come straight from this paper, including patterns we covered in the pillar guide about citation-worthy passages and source attribution.

This post covers what the paper says, what it doesn't, and how to apply its findings in 2026 — three years after publication, with engines that have moved meaningfully since.

The Princeton paper, in one section

Aggarwal et al., "GEO: Generative Engine Optimization" (arxiv 2311.09735, first posted November 2023, updated through 2024) defined the field. They built GEO-bench, a benchmark of 10,000 search queries across nine domains, ran them through generative search engines, and tested nine on-page interventions to see which ones lifted source visibility in generated answers.

The nine interventions they tested:

  1. Authoritative tone — rewriting passages with confident, declarative phrasing.
  2. Adding citations — explicitly attributing claims to named sources.
  3. Adding statistics — concrete numbers, percentages, dollar figures.
  4. Adding quotations — direct quotes from named sources.
  5. Fluency optimization — improving prose readability.
  6. Adding sources — listing references at the end.
  7. Keyword stuffing — increasing target keyword density.
  8. Easy-to-understand language — simplifying technical content.
  9. Technical jargon — increasing technical terminology density.

The headline finding: the top three methods (citations, quotations, statistics) produced meaningful visibility lifts in generated answers under their conditions. The paper reports lifts on the order of 30–40% for these methods on their benchmark, depending on the metric and engine. The paper distinguishes between "Position-Adjusted Word Count" and "Subjective Impression" as the two main metrics — both moved in the same direction for the top three methods.

The methodology details and exact percentages are worth reading the paper for if you want to apply the findings rigorously. The directional finding — that citation-worthy passages with concrete sourcing get cited more than the same content without — has held up across every audit we've run since.

What the paper got right and where it's incomplete

The paper landed at the right time. Generative search was new, the field had no frameworks, and Aggarwal et al. did the empirical work that established the basic priors. Three things they got right:

  1. The unit of analysis is the passage, not the page. Generative engines lift sentences. Optimizing at the page level — keyword targeting, meta descriptions — is a tier of abstraction above what actually moves citations.
  2. Source attribution matters. Pages that cite specific sources by name get cited at higher rates than pages with vague claims. This is now the default editorial discipline in any GEO-aware writing.
  3. Statistics and quotations are easy wins. They cost almost nothing to add and move the needle reliably. We've seen the same pattern in audits two and three years after the paper.

Three things the paper is incomplete on, by virtue of being published in 2023:

  1. The four-engine landscape didn't exist yet. The paper tested generative engines that existed at the time. ChatGPT search shipped in October 2024; Google AI Overviews launched in May 2024 as the production successor to SGE; Perplexity has had multiple model upgrades since the paper's data collection. The interventions that worked on 2023 generative engines may not generalise perfectly to today's mix.
  2. No engine-specific recommendations. The paper aggregates across engines. In practice, ChatGPT, Perplexity, Google AI Overviews, and Claude weight signals differently, and the optimization stack varies enough that engine-specific tuning matters. Our deep dives — ranking in ChatGPT, Google AI Overviews, Perplexity SEO — cover the engine-specific layers the paper didn't.
  3. No off-domain interventions tested. The paper is about on-page optimization. It does not test brand mentions across the open web, structured data, schema markup depth, entity hygiene, or llms.txt. These are now core parts of the AI search optimization stack and the largest gap between "the GEO paper says" and "what we recommend in 2026."

The paper is foundation, not playbook. Read it; apply the on-page findings; layer the off-domain and engine-specific work on top.

GEO vs SEO — what's shared and what's new

Most of SEO transfers directly. Some doesn't. A short comparison.

Shared signals (still matter, often as much as before):

  • Domain authority and link equity.
  • Indexability — but with an expanded crawler list (GPTBot, ClaudeBot, PerplexityBot, Google-Extended).
  • Structured data (Article, FAQPage, HowTo, Organization, Product).
  • Freshness — datePublished, dateModified, periodic refresh on top pages.
  • Page experience (Core Web Vitals, mobile-friendliness).
  • Editorial quality — original research, named authors, primary sources.

New emphasis (matter more for GEO than for traditional SEO):

  • Passage-level extractability. One-sentence answers under H2 headings, in plain language, with concrete specifics.
  • Entity clarity. sameAs arrays, Wikidata presence, consistent brand spelling across the open web.
  • Citation worthiness. Explicit source attribution, named experts, dollar figures and percentages where defensible.
  • Schema depth. Wider schema coverage than traditional SEO required — FAQPage, Person, BreadcrumbList, Product with offers on every relevant page.
  • llms.txt and AI-specific surface signals. A near-zero-cost addition that's becoming a soft standard.

No longer matters (or matters less):

  • Keyword density. Aggarwal et al. tested keyword stuffing as one of their nine interventions and it underperformed citations, statistics, and quotations meaningfully. Density matters less; relevance still does.
  • Meta description optimization for click-through. AI engines don't surface meta descriptions in the user-visible answer the way Google's blue links do. Meta descriptions still matter for traditional CTR; they don't move generative citations.
  • Pure backlink quantity. Backlinks still matter for the underlying retrieval index, but volume without quality has always underperformed and underperforms more in a citation-driven world.

The conventional read: GEO doesn't replace SEO. It adds a layer. Sites that already do strong SEO have a 60–70% head start on GEO; the remaining 30% is the new emphasis above.

The six levers, applied

Aggarwal et al. tested nine interventions; six of them moved the needle enough to be worth the operational cost. We've consolidated and slightly reframed them based on three years of post-paper field experience.

1. Citations

Cite specific sources by name with linked URLs. "Aggarwal et al. (2023) found that…" cites; "research suggests" doesn't. The bar is: would a reasonable journalist accept this as sourcing? If not, rewrite.

In 2026 this is also a freshness-of-source issue. A page citing 2019 research on a topic where the field has moved gets read as stale. Pull citations toward the most recent credible source whenever possible.

2. Statistics

Concrete numbers — dollar figures, percentages, dates, counts — beat hand-wavey prose. "We've seen 30–40% drops in informational query CTR after Overviews launched" is more cite-able than "Overviews can hurt CTR." Specificity is the lever.

Don't fabricate numbers. The cost of getting caught with bad statistics is higher than the cost of having fewer of them. If a number can't be sourced or measured, write the qualitative version honestly.

3. Quotations

Direct quotes from named real people get cited at high rates. A quote from "Jane Smith, Director of Marketing at Acme" is more cite-able than the same words as your own prose. Interview your own customers, your own team, your own founder. Attribute by name and title.

This is also the strongest E-E-A-T signal in the post-Helpful Content era. Worth doing for both reasons.

4. Authoritative tone

Confident declarative writing beats hedged prose. "Generative engines lift sentences, not pages" works. "It seems like generative engines may sometimes lift sentences rather than entire pages" doesn't.

The line to walk: confident-but-honest. Don't claim certainty you don't have. But replace every "perhaps" and "potentially" you can with the actual claim, and your citation rates climb.

5. Fluency

Prose that reads cleanly cites better than prose that reads like the writer was paid by the word. The model is selecting passages, and clean writing wins on every selection axis.

Fluency in this context isn't dumbing down — it's removing friction. Aggressively cut filler sentences, tighten transitions, prefer the verb to the noun phrase. The Sivon HQ house style is a pretty good template for this; you can read it across the small team marketing playbook and the rest of this cluster.

6. Easy-to-understand language

Aggarwal et al. found that simpler language slightly outperformed technical jargon for visibility. Our experience: this depends heavily on audience. Technical readers want the precise term-of-art. General-purpose readers want plain English. Match the audience.

The compromise that wins both: use the technical term when introducing a concept, then simplify in the immediate explanation. "Passage-level extractability — that is, how easily a model can lift a single sentence from your page — is the highest-leverage on-page lever." That sentence works for both audiences.

What to measure

The paper used Position-Adjusted Word Count and Subjective Impression as benchmarks. Useful for academic comparison; impractical for marketing teams. The metrics that matter in production:

  • Citation rate by query. How often does your domain appear in the citations panel for your target queries, across the four engines? A weekly spreadsheet of 20 prompts is enough.
  • Citation position. When you're cited, where do you rank in the citations panel — first, third, sixth? Position correlates with click-through.
  • Mention without citation. When your brand appears in the answer text but isn't in the citations panel, that's a signal that entity confidence is high but on-page signal is incomplete.
  • Trend over time. Absolute numbers in any given week are noise. Trend over 12+ weeks is signal. Track relentlessly; ignore week-over-week wobble.

Detail on the cross-engine measurement framework is in the AI search optimization pillar. Engine-specific measurement nuances are covered in the Google AI Overviews and Perplexity SEO deep dives.

Where this fits in the stack

GEO is one slice of AI search optimization. The full stack:

  • Foundational signals — indexability, schema, freshness — that overlap with SEO.
  • GEO levers — the six on-page interventions above, drawn from the Princeton paper.
  • Engine-specific tactics — the ChatGPT, Overviews, Perplexity, and Claude playbooks.
  • Off-domain entity work — brand mentions across the open web, Wikidata presence, social-profile entity signals.

Doing GEO well is necessary but not sufficient. Sites that nail the on-page work but skip the entity hygiene plateau quickly. Sites that build entity confidence but don't write for passage extraction get cited less than they should.

If you want this entire stack run as an audit on your domain — with the on-page GEO work scored alongside the engine-specific tactics and the off-domain entity signals — that's the AI Visibility engine in Sivon HQ. Same playbook, ranked, weekly.

The summary: read the Princeton paper once. Apply the six levers above. Layer engine-specific tactics on top. The work is durable, the levers compound, and the operators who do it consistently this year will be the cited sources next year. The field is still small enough that the bar to be excellent is achievable in months, not years.