GEO: Generative Engine Optimization
GEO: Generative Engine Optimization
GEO: Generative Engine Optimization
Abstract 1 Introduction
The advent of large language models (LLMs) The invention of traditional search engines three
arXiv:2311.09735v1 [cs.LG] 16 Nov 2023
has ushered in a new paradigm of search en- decades ago marked a shift in the way informa-
gines that use generative models to gather and tion was accessed and disseminated across the
summarize information to answer user queries. globe. While these search engines were power-
This emerging technology, which we formalize ful and ushered in a host of applications like aca-
under the unified framework of generative en-
demic research and e-commerce, they were lim-
gines (GEs), has the potential to generate accu-
rate and personalized responses, and is rapidly ited to providing a list of relevant websites to user
replacing traditional search engines like Google queries. The recent success of large language mod-
and Bing. Generative Engines typically satisfy els (LLMs) however has paved the way for better
queries by synthesizing information from mul- systems like BingChat, Google’s SGE, and per-
tiple sources and summarizing them with the plexity.ai that combine the strength of conventional
help of LLMs. While this shift significantly search engines with the flexibility of generative
improves user utility and generative search en-
models. We dub these new age systems generative
gine traffic, it results in a huge challenge for
the third stakeholder – website and content cre-
engines (GE) because they not only search for in-
ators. Given the black-box and fast-moving formation, but also generate multi-modal responses
nature of generative engines, content creators by synthesizing multiple sources. From a technical
have little to no control over when and how perspective, generative engines involve retrieving
their content is displayed. With generative en- relevant documents from a database (such as the
gines here to stay, the right tools should be internet) and using large neural models to gener-
provided to ensure that creator economy is not ate a response grounded on the sources, to ensure
severely disadvantaged. To address this, we
attribution and a way for the user to verify the in-
introduce G ENERATIVE E NGINE O PTIMIZA -
TION (GEO), a novel paradigm to aid content
formation.
creators in improving the visibility of their con- The usefulness of generative engines for both
tent in GE responses through a black-box opti- their developers and users is evident – users can ac-
mization framework for optimizing and defin- cess information faster and more accurately, while
ing visibility metrics. We facilitate system- developers can craft precise and personalized re-
atic evaluation in this new paradigm by intro- sponses, both to improve user satisfaction and rev-
ducing GEO- BENCH, a benchmark of diverse
enue. However, generative engines put the third
user queries across multiple domains, coupled
with sources required to answer these queries.
stakeholder – website and content creators – at a
Through rigorous evaluation, we demonstrate disadvantage. Generative Engines, in contrast to
that GEO can boost visibility by up to 40% traditional search engines, remove the need to nav-
in GE responses. Moreover, we show the effi- igate to websites by directly providing a precise
cacy of these strategies varies across domains, and comprehensive response, which can lead to
underscoring the need for domain-specific op- a drop in organic traffic to websites and severely
timization methods. Our work opens a new impact their visibility. With several millions of
frontier in the field of information discovery
small businesses and individuals relying on online
systems, with profound implications for both
developers of GEs and content creators.1 traffic and visibility for their livelihood, genera-
1
tive engines will significantly disrupt the creator
Code and data available at https://GEO-optim.github.
io/GEO/. * Equal Contribution
economy. Further, the black-box and proprietary discover a dependence of the effectiveness of G EN -
nature of generative engines makes it prohibitively ERATIVE E NGINE O PTIMIZATION methods on the
difficult for content creators to control and under- domain of the query.
stand how their content is ingested and portrayed In summary, our contributions are three-fold:
by generative engines. In this work, we take a first (1) We propose G ENERATIVE E NGINE O PTIMIZA -
step towards a general creator-centric framework TION, the first general framework for website own-
to optimize content for generative engines, which ers to optimize their websites for generative en-
we dub G ENERATIVE E NGINE O PTIMIZATION gines. (2) Our framework proposes a comprehen-
(GEO), to empower content creators to navigate sive set of visibility metrics designed for generative
this new search paradigm with greater confidence. engines and enables content creators to create their
GEO is a black-box optimization framework own customized visibility metrics. (3) To foster
for optimizing the visibility of web content for faithful evaluation of G ENERATIVE E NGINE O P -
proprietary and closed-source generative engines TIMIZATION methods in the age of Generative En-
(Figure 1). G ENERATIVE E NGINE O PTIMIZATION gines, we propose the first large-scale benchmark
ingests a source website and outputs an optimized consisting of diverse search queries from wide-
version of the website by tailoring and calibrating ranging domains and datasets, specially tailored
the presentation, text style, and content to increase for Generative Engines.
the likelihood of visibility in generative engines.
However, note that the notion of visibility in 2 Formulation & Methodology
generative engines is highly nuanced and multi- 2.1 Formulation of Generative Engines
faceted (Figure 3). While average ranking on the
Despite the deployment of a myriad of generative
search results page is a good measure of visibil-
engines to millions of users already, there is cur-
ity in traditional search engines which present a
rently no standard framework. We provide a for-
linear list of websites, this does not apply to gener-
mulation that can accommodate various modular
ative engines. Generative Engines provide rich and
components incorporated in their design.
highly structured responses and embed websites as
We describe a generative engine, which includes
inline citations in the response, often embedding
several backend generative models and a search en-
them with different lengths, at varying positions,
gine for source retrieval. A Generative Engine (GE)
and with diverse styles. This therefore necessitates
takes as input a user query qu and returns a natural
the need for visibility metrics tailor-made for gen-
language response r, where PU represents person-
erative engines, which measure the visibility of
alized user information, such as preferences and
attributed sources over multiple dimensions, such
history. The GE can be represented as a function:
as relevance and influence of citation to query, mea-
sured through both an objective and a subjective fGE := (qu , PU ) → r (1)
lens. Our GEO framework proposes a holistic set
of visibility metrics and enables content creators to While the response r can be multimodal, we sim-
create their own customized visibility metrics. plify it to a textual response in this section.
To facilitate faithful and extensive evaluation Generative Engines are comprised of two cru-
of GEO methods in this new paradigm, we pro- cial components: a.) A set of generative mod-
pose GEO- BENCH, a benchmark consisting of 10K els G = {G1 , G2 ...Gn }, each serving a specific
queries from a diverse set of domains and sources, purpose like query reformulation or summariza-
specially adapted for generative engines. Through tion, and b.) A search engine SE that returns a
systematic evaluation, we demonstrate that our set of sources S = {s1 , s2 ...sm } given a query
proposed G ENERATIVE E NGINE O PTIMIZATION q. We present a representative workflow in Fig-
methods can boost visibility by up to 40% on a ure 2, which at the time of writing, closely re-
diverse set of queries, providing beneficial strate- sembles the design of BingChat. This workflow
gies for content creators to improve their visibility breaks down the input query into a set of sim-
in the rapidly adapted generative engines. Among pler queries that are easier to consume for the
other things, we find that including citations, quo- search engine. Given a query, a query re-formulator
tations from relevant sources, and statistics can sig- generative model, G1 = Gqr , generates a set of
nificantly boost source visibility, with an increase queries Q1 = {q1 , q2 ...qn }, which are then passed
of over 40% across various queries. Further, we to the search engine SE to retrieve a multi-set
Figure 1: Our proposed G ENERATIVE E NGINE O PTIMIZATION (GEO) method optimizes websites to boost their
visibility in Generative Engine responses. GEO’s black-box optimization framework then enables the website
owner of the pizza website, which lacked visibility originally, to optimize their website to increase visibility under
Generative Engines. Further, GEO’s general framework allows content creators to define and optimize their custom
visibility metrics, giving them greater control in this new emerging paradigm.
Table 1: Performance improvement of GEO methods on GEO- BENCH. Performance Measured on Two metrics
and their sub-metrics. Compared to the baselines simple methods such as Keyword Stuffing traditionally used in
SEO do not perform very well. However, our proposed methods such as Statistics Addition and Quotation Addition
show strong performance improvements across all metrics considered. The best performing methods improve upon
baseline by 41% and 29% on Position-Adjusted Word Count and Subjective Impression respectively. For readability,
Subjective Impression scores are normalized with respect to Position-Adjusted Word Count resulting in baseline
scores being similar across the metrics
Table 4: Representative examples of GEO methods optimizing source website. Additions are marked in green
and Deletions in red. Without adding any substantial new information in the content, GEO methods are able to
significantly increase the visibility of the source content.
Table 5: Performance improvement of GEO methods on GEO- BENCH with Perplexity.ai as generative engine.
Compared to the baselines simple methods such as Keyword Stuffing traditionally used in SEO do not perform very
well with often negative performance. However, our proposed methods such as Statistics Addition and Quotation
Addition show strong performance improvements across all metrics considered. The best performing methods
improve upon baseline by 22% on Position-Adjusted Word Count and 37% on Subjective Impression. The scores
demonstrate the high impact of our proposed method directly on the already deployed generative engines.
et al., 2020; Kumar et al., 2019) These methods tional SEO-based strategies will not be applicable
are typically classified into On-Page SEO, which to Generative Engine settings highlighting the need
involves improving the actual content of the web- for GEO.
site and optimizing user experience and accessibil-
ity, and Off-Page SEO, which involves improving 8 Conclusion
the website’s authority and reputation through link
building and recognition. In contrast, GEO deals In this work, we formulate the new age search en-
with a more complex environment involving multi- gines that we dub generative engines and propose
modality, conversational settings. Further, since G ENERATIVE E NGINE O PTIMIZATION (GEO) to
GEO is optimized against a generative model that help put the power in the hands of content cre-
is not limited to simple keyword matching, tradi- ators to optimize their content. We define impres-
sion metrics for generative engines and propose
a benchmark encompassing diverse user queries
from multiple domains and settings, along with rel-
evant sources needed to answer those queries. We
propose several ways to optimize content for gener-
ative engines and demonstrate that these methods
are capable of boosting source visibility by up to
40% in generative engine responses. Among other
things, we find that including citations, quotations
from relevant sources, and statistics can signifi-
cantly boost source visibility. Further, we discover
a dependence of the effectiveness of G ENERATIVE
E NGINE O PTIMIZATION methods on the domain
of the query. Our work serves as a first step towards
understanding the impact of generative engines on
the digital space and the role of G ENERATIVE E N -
GINE O PTIMIZATION in this new age of search
engines.
Ethical Considerations and Reproducibility Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan
Statement Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea
Madotto, and Pascale Fung. 2023. Survey of halluci-
In our study, we focus on enhancing the visibility of nation in natural language generation. ACM Comput-
ing Surveys, 55(12):1–38.
websites in generative engines. We do not directly
interact with sensitive data or individuals. While R.Anil Kumar, Zaiduddin Shaik, and Mohammed
the sources we retrieve from search engines may Furqan. 2019. A survey on search engine optimiza-
tion techniques. International Journal of P2P Net-
contain biased or inappropriate content, these are work Trends and Technology.
already publicly accessible, and our study neither
amplifies nor endorses such content. We believe Tom Kwiatkowski, Jennimaria Palomaki, Olivia Red-
field, Michael Collins, Ankur P. Parikh, Chris Alberti,
that our work is ethically sound as it primarily deals Danielle Epstein, Illia Polosukhin, Jacob Devlin, Ken-
with publicly available information and aims to ton Lee, Kristina Toutanova, Llion Jones, Matthew
improve the user experience in generative engines. Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob
Regarding reproducibility, we have made our Uszkoreit, Quoc V. Le, and Slav Petrov. 2019. Natu-
ral questions: A benchmark for question answering
code available to allow others to replicate our re- research. Transactions of the Association for Compu-
sults. Our main experiments have been conducted tational Linguistics, 7:453–466.
with five different seeds to minimize potential sta-
Nelson F. Liu, Tianyi Zhang, and Percy Liang. 2023a.
tistical deviations. Evaluating verifiability in generative search engines.
ArXiv, abs/2304.09848.
Yang Liu, Dan Iter, Yichong Xu, Shuo Wang, Ruochen
References Xu, and Chenguang Zhu. 2023b. G-eval: Nlg evalua-
Daria Alexander, Wojciech Kusa, and Arjen P. de Vries. tion using gpt-4 with better human alignment. ArXiv,
2022. Orcas-i: Queries annotated with intent using abs/2303.16634.
weak supervision. Proceedings of the 45th Inter-
Jacob Menick, Maja Trebacz, Vladimir Mikulik,
national ACM SIGIR Conference on Research and
John Aslanides, Francis Song, Martin Chadwick,
Development in Information Retrieval.
Mia Glaese, Susannah Young, Lucy Campbell-
Gillingham, Geoffrey Irving, and Nathan McAleese.
Prashant Ankalkoti. 2017. Survey on search engine 2022. Teaching language models to support answers
optimization tools & techniques. Imperial journal of with verified quotes. ArXiv, abs/2203.11147.
interdisciplinary research, 3.
Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christo-
Akari Asai, Xinyan Velocity Yu, Jungo Kasai, and Han- foros Nalmpantis, Ramakanth Pasunuru, Roberta
naneh Hajishirzi. 2021. One question answering Raileanu, Baptiste Rozière, Timo Schick, Jane
model for many languages with cross-lingual dense Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann
passage retrieval. In Neural Information Processing LeCun, and Thomas Scialom. 2023. Augmented
Systems. language models: a survey. ArXiv, abs/2302.07842.
Sihao Chen, Daniel Khashabi, Wenpeng Yin, Chris Reiichiro Nakano, Jacob Hilton, S. Arun Balaji, Jeff Wu,
Callison-Burch, and Dan Roth. 2019. Seeing things Ouyang Long, Christina Kim, Christopher Hesse,
from a different angle:discovering diverse perspec- Shantanu Jain, Vineet Kosaraju, William Saunders,
tives about claims. In North American Chapter of Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen
the Association for Computational Linguistics. Krueger, Kevin Button, Matthew Knight, Benjamin
Chess, and John Schulman. 2021. Webgpt: Browser-
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, assisted question-answering with human feedback.
Daniel Fernando Campos, and Jimmy J. Lin. 2021. ArXiv, abs/2112.09332.
Ms marco: Benchmarking ranking models in the A. Shahzad, Deden Witarsyah Jacob, Nazri M. Nawi,
large-data regime. Proceedings of the 44th Inter- Hairulnizam Bin Mahdin, and Marheni Eka Saputri.
national ACM SIGIR Conference on Research and 2020. The new trend for search engine optimization,
Development in Information Retrieval. tools and techniques. Indonesian Journal of Electri-
cal Engineering and Computer Science, 18:1568.
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasu-
pat, and Ming-Wei Chang. 2020. Realm: Retrieval- Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju,
augmented language model pre-training. ArXiv, Eric Michael Smith, Stephen Roller, Megan Ung,
abs/2002.08909. Moya Chen, Kushal Arora, Joshua Lane, Morteza
Behrooz, W.K.F. Ngan, Spencer Poff, Naman Goyal,
Bernard Jim Jansen, Danielle L. Booth, and Amanda Arthur Szlam, Y-Lan Boureau, Melanie Kambadur,
Spink. 2008. Determining the informational, naviga- and Jason Weston. 2022. Blenderbot 3: a deployed
tional, and transactional intent of web queries. Inf. conversational agent that continually learns to respon-
Process. Manag., 44:1251–1266. sibly engage. ArXiv, abs/2208.03188.
Romal Thoppilan, Daniel De Freitas, Jamie Hall,
Noam M. Shazeer, Apoorv Kulshreshtha, Heng-Tze
Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du,
Yaguang Li, Hongrae Lee, Huaixiu Steven Zheng,
Amin Ghafouri, Marcelo Menegali, Yanping Huang,
Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao
Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts,
Maarten Bosma, Yanqi Zhou, Chung-Ching Chang,
I. A. Krivokon, Willard James Rusch, Marc Pick-
ett, Kathleen S. Meier-Hellstern, Meredith Ringel
Morris, Tulsee Doshi, Renelito Delos Santos, Toju
Duke, Johnny Hartz Søraker, Ben Zevenbergen, Vin-
odkumar Prabhakaran, Mark Díaz, Ben Hutchinson,
Kristen Olson, Alejandra Molina, Erin Hoffman-
John, Josh Lee, Lora Aroyo, Ravindran Rajakumar,
Alena Butryna, Matthew Lamm, V. O. Kuzmina,
Joseph Fenton, Aaron Cohen, Rachel Bernstein, Ray
Kurzweil, Blaise Aguera-Arcas, Claire Cui, Mar-
ian Rogers Croak, Ed Huai hsin Chi, and Quoc Le.
2022. Lamda: Language models for dialog applica-
tions. ArXiv, abs/2201.08239.
Table 6: Performance improvement of GEO methods on GEO- BENCH. Performance Measured on Two metrics
and their sub-metrics.