Keywords: Retrieval-Augmented Generation, Submodular Optimization, Information Coverage, Multi-hop Reasoning, LLM Efficiency
Abstract: Retrieval-Augmented Generation (RAG) augments Large Language Models with external knowledge by retrieving relevant documents, yet its performance is often bottlenecked by *context construction*: selecting a small set of retrieved documents under a fixed token budget. Standard top-$k$ selection is fast but frequently wastes budget on redundant evidence and fails to cover complementary facts needed for multi-hop reasoning. We cast RAG document selection as *monotone submodular maximization* under a knapsack (token-budget) constraint, motivated by the diminishing-returns nature of information coverage. Concretely, we instantiate the objective as a weighted coverage function over query-relevant *concepts*, which is provably monotone and submodular. We then apply a standard approximation algorithm for knapsack-constrained monotone submodular maximization, obtaining a $(1-1/e)$ approximation guarantee *for this surrogate objective*. Experiments on Natural Questions, ELI5, and HotpotQA show that our framework, **Submodular-RAG (S-RAG)**, improves answer quality over Top-$k$ and MMR across EM, BERTScore/ROUGE, and LLM-as-a-judge evaluations, with particularly strong gains on multi-hop questions.
Paper Type: Long
Research Area: Retrieval-Augmented Language Models
Research Area Keywords: retrieval-augmented generation, context selection, submodular optimization, information coverage, knapsack constraint, LLM reasoning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Theory
Languages Studied: English
Submission Number: 10185
Loading