A Comparative Analysis of Generative and Dense Retrieval

ACL ARR 2026 January Submission3725 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative Retrieval, Dense Retrieval
Abstract: Generative retrieval (GR) offers an alternative to dense retrieval (DR) by directly generating identifiers of relevant documents. Relatively little is known about how the two relate. We theoretically and empirically investigate how GR fundamentally differs from DR in both learning objective and representational capacity. GR performs globally normalized maximum-likelihood optimization and encodes corpus and relevance information directly in the model parameters, whereas DR adopts locally normalized objectives and represents the corpus with external embeddings before computing similarity via a bilinear interaction. Our analysis suggests that, under scaling, GR can overcome the inherent limitations of DR, yielding two major benefits. First, with larger corpora, GR avoids the sharp performance degradation caused by the optimization drift induced by DR’s local normalization. Second, with larger models, GR’s representational capacity scales with parameter size, unconstrained by the global low-rank structure that limits DR. We validate these theoretical insights through experiments on the Natural Questions and MS MARCO datasets, across varying negative sampling strategies, embedding dimensions, and model scales. However, despite its theoretical advantages, GR does not universally outperform DR in practice. We outline directions to bridge the gap between GR's theoretical potential and practical performance, providing guidance for future research in scalable and robust generative retrieval.
Paper Type: Long
Research Area: Information Extraction and Retrieval
Research Area Keywords: Information Retrieval and Text Mining
Contribution Types: Model analysis & interpretability, Theory
Languages Studied: English
Submission Number: 3725
Loading