PureCover: Bridging the Gap in Re-ranking for Retrieval-Augmented Generation via Balancing Coverage and Noise
Keywords: Retrieval-Augmented Generation, Re-ranking, Multi-objective Optimization
Abstract: Re-ranking, originating from Information Retrieval (IR), has become a critical technique for filtering retrieved documents in Retrieval-Augmented Generation (RAG). Current RAG systems often directly apply re-rankers from traditional IR, which were originally designed to provide relevant and diverse documents to human users. However, this adoption overlooks a fundamental gap: unlike humans can use selective attention to filter noise and focus on key evidence, LLMs lack this ability. This gap causes traditional re-rankers to fail in covering essential evidence and minimizing noise for LLMs, significantly hurting RAG performance, especially in complex question-answering tasks. To address this, we argue that RAG re-rankers should serve a distinct objective: not only ensuring the coverage of key information but also minimizing noise in the selected document set. To achieve this objective, we propose PureCover, a document selection framework tailored for RAG. Instead of relying on traditional Top-K re-ranking, we reformulate the document selection process as a multi-objective optimization problem and solve it by exploiting LLM attention patterns during goal-oriented reasoning. To improve efficiency, we distill the selection capability into an LLM selector via a set-wise strategy. Experiments on four multi-hop QA benchmarks demonstrate that PureCover consistently outperforms state-of-the-art baselines, achieving a better balance between coverage and noise for RAG.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 16803
Loading