Light-Search: Reducing Retrieval Cost in RAG via Curriculum-Based Policy Optimization

Sida Lin; Jiacheng Liu; Yankai Chen; Weixu Zhang; Henry Peng Zou; Hao Gu; Qiyuan Zhu; Xue Liu; Wei Xue; Yike Guo

Light-Search: Reducing Retrieval Cost in RAG via Curriculum-Based Policy Optimization

Sida Lin, Jiacheng Liu, Yankai Chen, Weixu Zhang, Henry Peng Zou, Hao Gu, Qiyuan Zhu, Xue Liu, Wei Xue, Yike Guo

16 Sept 2025 (modified: 05 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, RAG, Efficiency

Abstract: Retrieval-Augmented Generation (RAG) is pivotal for modern Large Language Models. However, its practical deployment is often hindered by prohibitive inference costs, encompassing both latency and financial overhead from retrieval calls. Current reinforcement learning frameworks focus on improving search capability by solely maximizing answer accuracy, which inadvertently encourages excessive and costly search behavior. This overlooks the fundamental trade-off between task performance and computational efficiency. To address this, we introduce \Ours, a systematic reinforcement learning framework that teaches models to balance answer quality with search cost. We find that naively penalizing search actions leads to unstable training and suboptimal policies. Therefore, \Ours employs a \emph{two-stage curriculum} that first builds robust search capabilities before introducing a cost-augmented reward function to cultivate efficiency. This learning process is underpinned by a stabilized policy optimization algorithm, ensuring the model can robustly learn a judicious policy on when to search. Experiments across diverse question-answering benchmarks show that \Ours drastically reduces retrieval calls by up to 76.5\% while maintaining performance competitive with state-of-the-art models. By enabling a controllable balance between effectiveness and efficiency, \Ours provides a practical path toward building powerful yet economical RAG systems.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 6755

Loading