Keywords: Retrieval-Augmented Generation, LLM
Abstract: Retrieval-Augmented Generation (RAG) systems often rely on information retrieved from heterogeneous sources to support generation tasks. However, existing approaches typically either aggregate all sources uniformly or statically select a single source, neglecting semantic complementarity. Moreover, they commonly employ re-ranking models to obtain Top-k documents, without accounting for actual contribution to generation objective.
In this paper, we propose GRO-RAG, a training-free, gradient-aware re-ranking framework for multi-source RAG.
Our method performs Top-k document selection by reading gradients from the language model, estimating each document’s contribution to the generation loss through a single backward pass.
This enables re-ranking not by heuristic relevance, but by direct feedback from LLM's generation objective.
At the source level, we incorporate inter-source redundancy and query relevance to select source combination prior to re-ranking.
Theoretically, we prove that this gradient-based Top-k selection approximates the optimal subset minimizing the generation loss, and aligns with minimizing the leave-one-out loss upper bound.
Experiments across multi-source QA and open-domain generation tasks demonstrate consistent improvements in generation quality, highlighting the importance of generation-aware retrieval selection in multi-source RAG.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 6212
Loading