Keywords: generative models, search engines, datasets and benchmarks
TL;DR: We propose generative engine optimization, a novel paradigm to help content creators navigate generative engines, which are a new age way of discovering information on the internet.
Abstract: The advent of large language models (LLMs) has ushered in a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. This emerging technology, which we formalize under the unified framework of generative engines (GEs), has the potential to generate accurate and personalized responses, and is rapidly replacing traditional search engines like Google and Bing. Generative engines typically satisfy queries by synthesizing information from multiple sources and summarizing them with the help of LLMs. While this shift significantly improves user utility and generative search engine traffic, it results in a huge challenge for the third stakeholder - website and content creators. Given the black-box and fast-moving nature of generative engines, content creators have little to no control over when and how their content is displayed. With generative engines here to stay, the right tools should be provided to ensure that creator economy is not severely disadvantaged. To address this, we introduce generative engine optimization (GEO), a novel paradigm to aid content creators in improving their visibility. In this work, we propose several optimizations that can be applied to improve the visibility of content. To evaluate and compare different GEO methods, we propose a benchmark encompassing diverse user queries from multiple domains and settings, along with relevant sources needed to answer those queries. Through rigorous experiments on the proposed benchmark, we demonstrate different GEO methods involving well-designed textual enhancements, are capable of boosting source visibility by up to 40% in Generative engines responses. We find several insights that aid content creators -- for example, adding citations and quotations significantly improves visibility. We also discover that these optimizations are domain dependent, thus requiring a change in the nature of the optimization based on the source. Our work opens a new frontier in the field of information discovery systems, with profound implications for both developers of Generative enginess and content creators.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8373
Loading