LLM-Augmented Retrieval: Generalize Retriever Models to Specific Domains Without Finetuning

ACL ARR 2025 February Submission1864 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent advancements in embedding-based retrieval, commonly referred to as dense retrieval, have achieved state-of-the-art results, surpassing the performance of traditional sparse or bag-of-words methodologies. Embedding-based techniques are extensively utilized in enterprise and domain-specific search applications, which often require finetuning on domain-specific data to enhance retrieval performance. However, the scarcity of domain-specific data and the complexity of finetuning present significant challenges in developing efficient domain-specific retrieval systems. This paper introduces a training-free, model-agnostic document-level embedding framework augmented by a large language model (LLM). This framework significantly enhances the efficacy of prevalent retriever models, including Bi-encoders (such as Contriever and DRAGON) and late-interaction models (such as ColBERTv2), and generalizes them into new domains. As a result, this approach has achieved state-of-the-art performance on benchmark datasets like LoTTE and BEIR, highlighting its potential to advance information retrieval processes, particularly in domain-specific contexts.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: document representation; dense retrieval; passage retrieval; generalization; domain adaptation;
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Theory
Languages Studied: English
Submission Number: 1864
Loading