Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control

Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control

ICLR 2026 Conference Submission18978 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval Augmented Generation

Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to mitigate large language model (LLM) hallucinations by incorporating external knowledge retrieval. However, existing RAG frameworks often apply retrieval indiscriminately, leading to inefficiencies---over-retrieving when unnecessary or failing to retrieve iteratively when required for complex reasoning. Although recent retrieval strategies can adaptively navigate among alternative retrieval strategies, they make their selection based solely on query complexity and incorporate no mechanism for prioritizing speed over accuracy or vice versa. This lack of user-defined control makes their use infeasible for diverse user application needs. In this paper, we introduce a novel user-controllable RAG framework that enables dynamic adjustment of the accuracy-cost trade-off. Our approach leverages two classifiers: one trained to prioritize accuracy and another to prioritize retrieval efficiency. Via an interpretable control parameter $\alpha$, users can seamlessly navigate between minimal-cost retrieval and high-accuracy retrieval depending on their specific requirements. We empirically demonstrate that our approach effectively balances accuracy, retrieval cost, and user controllability \footnote{Code is available at anonymous github \url{https://anonymous.4open.science/r/Flare-RAG-Anonymous-D6A2/}.}, making it a practical and adaptable solution for real-world applications.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 18978

Loading