LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

ACL ARR 2025 February Submission2557 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In traditional RAG framework, the basic retrieval units are normally short. The common retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design forces the retriever to search over a large corpus to find the "needle" unit. In contrast, the readers only need to extract answers from the short retrieved units. Such an imbalanced "heavy" retriever and "light" reader design can lead to sub-optimal performance. In order to alleviate the imbalance, we propose a new framework LongRAG, consisting of a "long retriever" and a "long reader". LongRAG processes the entire Wikipedia into 4K-token units, which is 30x longer than before. By increasing the unit size, we significantly reduce the total units.% from 22M to 600K. This significantly lowers the burden of retriever, which leads to a remarkable retrieval score. Then we feed the top-k retrieved units (~ 30K tokens) to an existing long-context LLM to perform zero-shot answer extraction. Without requiring any training, LongRAG achieves an EM of 62.7% on NQ and 64.3% on HotpotQA (full-wiki), on par with the (fully-trained) SoTA model. Our study offers insights into the future roadmap for combining RAG with long-context LLMs.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: retrieval augmented generation

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 2557

Loading