Abstract: Retrieval-augmented generation (rag) systems extend the capabilities of generating responses beyond the pretrained knowledge of large language models by augmenting the input prompt with relevant documents retrieved by an information retrieval system, which is of particular importance when knowledge is constantly updated and cannot be memorized by the model. Rag-based systems operate in two phases: retrieval and generation. In the retrieval phase, documents are retrieved from various versions of the original query, then fused and reranked to create a unified list, and the more relevant list of documents, the better the subsequent generation phase. In this paper, we propose an unsupervised method to enhance the retrieval phase by transforming an original query into newly reformulated versions without semantic drift to enhance the relevance of the retrieved documents. Specifically, for an original query, (1) we generate its backtranslated versions via different languages, (2) retrieve an ordered list of relevant documents for each backtranslated version, and finally, (3) merge the lists of retrieved documents into a single ranked list via reciprocal rank fusion. Our extensive experiments across 5 query sets with different query topics and 10 languages from 7 language families using 2 neural machine translators demonstrated the effectiveness of our proposed method in enhancing rag’s retrieval in comparison with existing unsupervised query expanders. We open-sourced our research at https://github.com/fani-lab/RePair/tree/rrf-wise24.
Loading