More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG

ACL ARR 2025 February Submission2062 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Retrieval-augmented generation (RAG) provides LLMs with relevant documents. Although previous studies noted that retrieving many documents can degrade performance, they did not isolate how the quantity of documents affects performance while controlling for context length. We evaluate various language models on custom datasets derived from a multi-hop QA task. We keep the context length and position of relevant information constant while varying the number of documents, and find that increasing the document count in RAG settings poses significant challenges for LLMs. Additionally, our results indicate that processing multiple documents is a separate challenge from handling long contexts. We will publicly release the datasets and code upon publication to facilitate further research in multi-document retrieval.
Paper Type: Short
Research Area: Information Retrieval and Text Mining
Research Area Keywords: RAG, Multi-document, Evaluation
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 2062
Loading