What 9 Studies Say AI Search Engines Actually Want (And Most of It Will Surprise You)

Everyone has a theory about how to rank in AI search. Add statistics. Use authority signals. Write longer content. The advice is everywhere, and almost none of it is backed by data.

Over the past year, a wave of peer-reviewed research has actually tested what drives AI citations at scale. Tens of thousands of queries, hundreds of thousands of web pages, multiple engines. The results consistently overturn the conventional wisdom. Some of the most popular GEO tactics are actively hurting performance. Some overlooked signals are doing the heavy lifting.

Here is what nine studies found.

1. Simpler language gets cited more often

Researchers analyzed approximately 10,000 websites and found that text perplexity, a measure of how linguistically predictable content is to a language model, is a significant citation driver. One standard deviation drop in perplexity increased citation probability from 47% to 56%. That is a nine-point lift from making your sentences more straightforward and your structure more predictable.

The practical implication: AI systems are themselves language models. They find fluent, predictable text easier to process and excerpt. Content that reads like a human would naturally write it, without unusual phrasing or convoluted sentence structures, has a measurable advantage.

The same study found that Google AI Overviews prefer content with higher semantic similarity to other sources it already cites. If your page covers a topic in a way that aligns with the existing knowledge cluster around that query, you are more likely to be included.

2. Document structure matters at three levels

A 2026 study on structural features across six generative engines found that well-structured content improved citation rates by 17.3% and subjective quality scores by 18.5%. The researchers broke structure into three levels: macro (the overall architecture of a document), meso (how information is chunked and organized within sections), and micro (visual emphasis like bold text and headers).

All three levels contributed. Getting the macro right (a clear logical flow with a defined beginning, middle, and end) was not enough on its own. How you chunk information within sections, and whether you use visual markers to signal what matters, both add independent value.

This aligns with how AI systems parse pages. They are not reading the way humans do. They are extracting. Content that makes extraction easy by using consistent structure at every level gives AI engines more to work with.

3. Semantic HTML and outlinks are stronger signals than domain authority

A December 2025 study analyzed 55,936 queries across six LLM search engines and built a predictive model for which pages get cited (F1 = 0.758). The strongest predictors were semantic HTML, readability, and outbound links to reputable sources. Not domain authority. Not backlinks. Not content length.

Two other numbers from this study are worth paying attention to. LLM search engines cite an average of 4.3 URLs per query, compared to 10.3 for traditional search. The pool is dramatically smaller. And 37% of domains that appear in LLM citations do not appear in traditional search results at all.

That last number reframes the whole opportunity. AI search is not just a reshuffling of existing SEO rankings. It is a different game with different winners. Optimizing schema markup and AI citations is not an incremental SEO tactic. It is one of the clearest ways to enter a citation pool that traditional SEO cannot even reach.

4. Pages that pass a quality threshold get cited 78% of the time

A September 2025 study analyzed 1,702 citations across Brave Summary, Google AI Overviews, and Perplexity and identified 16 quality pillars that predict citation. Pages scoring 0.70 or higher and meeting at least 12 of the 16 pillars achieved a 78% cross-engine citation rate.

The four pillars with the strongest correlation (all above r = 0.63): Metadata, Content Freshness, Semantic HTML, and Structured Data. The single strongest predictor overall was overall page quality, with an odds ratio of 4.2. A high-quality page is 4.2 times more likely to be cited than a low-quality one, independent of any individual technical signal.

The lesson: technical optimization without substantive content quality does not get you there. The pillars amplify good content. They do not substitute for it.

5. Most GEO tactics do not work, and some make things worse

This is the finding that should make every GEO practitioner uncomfortable. Researchers at MIT tested 15 common GEO heuristics across 7,151 queries and 52,165 Amazon products. Ten of the 15 tactics produced negligible or negative results.

The worst offender was storytelling framing. Adding narrative structure dropped rankings by 4.03 positions on average. This contradicts a lot of popular content advice, and it makes sense when you think about what AI systems are actually doing: extracting specific answers to specific questions. Narrative framing buries the answer in a story arc, making it harder to surface.

What actually worked: emphasizing user intent, competitive differentiation, social proof (reviews and ratings), and factual accuracy. These are not novel tactics. They are the fundamentals of good persuasive content. The tactics that do not work tend to be the ones that prioritize appearing optimized over being genuinely useful.

6. Direct quotes beat statistics for trustworthiness

A September 2025 benchmark tested optimization strategies across 1,030 articles and 5,353 query-article pairs. Adding direct quotes was the single most effective strategy across nearly all evaluation dimensions. Statistics had a more complicated effect: they boosted scores for causal impact but degraded trustworthiness scores.

The deeper finding is more important. Citation frequency and causal influence are almost entirely uncorrelated (r approximately 0.019). A source can appear in an AI citation list while contributing almost nothing to the actual generated answer. The AI referenced it without really using it.

This is a critical distinction that most citation tracking misses. Appearing in a citation is not the same as shaping the answer. If your goal is to influence what AI search engines tell users about your category, frequency alone is not the metric you want. You need to know whether your content is actually being incorporated into the response, not just listed as a source. This is the gap that BabyPenguin is built to close: tracking citation patterns across engines with enough granularity to distinguish surface mentions from real influence.

7. Linked structured data boosts RAG accuracy by nearly 30%

A 2026 study on retrieval-augmented generation systems found that implementing linked data as a memory layer improved RAG accuracy by 29.6%. Most marketers are not thinking about their content in RAG architecture terms, but AI search engines increasingly use retrieval-augmented approaches to generate answers.

Structured data (schema markup, linked entities, clearly defined relationships between concepts on a page) is not just a technical SEO checkbox. It is the format that makes content machine-readable in the way AI retrieval systems actually need. Pages that provide structured, linked context about who they are, what they cover, and how their claims connect to authoritative external sources give RAG systems better material to work with.

8. When everyone copies the same tactics, everyone loses

The C-SEO Bench study presented at NeurIPS 2025 modeled what happens when many actors simultaneously adopt the same optimization tactics. The answer: collective gains collapse. The advantage of any given tactic decreases as adoption spreads, and in the aggregate, manipulation-focused approaches underperform traditional content quality.

This has direct implications for GEO strategy. The tactics that get written up in listicles and shared widely are the ones most likely to be saturated. Durable advantage in AI search comes from the same place it always has in traditional search: genuinely useful content that is hard to replicate. A source gap analysis that identifies where your competitors are not showing up is more strategically valuable than copying what they are doing.

9. AI search is biased toward third-party coverage

A September 2025 study of 1,000 queries across four AI search engines found that AI engines cite earned and third-party media 70 to 90% of the time. Brand-owned content gets minimal pickup regardless of quality.

This is the finding that most brand content teams have not fully reckoned with. You can publish excellent content on your own domain, tick every technical box, and still get systematically underweighted because AI engines treat third-party sources as more credible by default. The implications for self-promotional content are significant: owned content builds the foundation, but earned media gets you into the citation pool.

Getting cited in AI search at scale requires a presence in the places AI engines trust: industry publications, analyst reports, review sites, editorial coverage.

What the research actually adds up to

Taken together, these nine studies point in a consistent direction. AI search engines reward content that is easy to parse, structured at every level, technically sound, genuinely useful, and corroborated by external sources. They penalize narrative framing that buries answers, manipulation tactics that everyone else is already using, and owned content that has not earned external validation.

The specific numbers matter. A nine-point citation lift from readability. A 78% cross-engine citation rate once you clear a quality threshold. A 4.03-position drop from storytelling framing. These are not directional hunches. They are measurable effects with sample sizes large enough to act on.

The harder problem is turning research like this into operational decisions. Which pages on your site are structurally weak at the micro level? Which competitors are getting cited on queries where you are invisible? Which of your citations are frequency without real influence? Knowing the right variables is step one. Measuring them across ChatGPT, Gemini, Grok, and the other engines your audience actually uses is where the work happens.

BabyPenguin tracks citations at the prompt level, breaks down which sources are appearing alongside yours, and lets you run side-by-side competitor comparisons across engines. The research tells you what to optimize. BabyPenguin tells you whether it is working.