Learning When to Search: Multi-Stage Reward Design for Efficient Agentic RAG with Reasoning Language Models
Keywords: Under Retrieval, Reinforcement Learning, Performance–Efficiency Trade-off
Abstract: Reasoning Language Models (RLMs) provide strong step-by-step reasoning and evidence integration for agentic Retrieval-Augmented Generation (RAG), yet their retrieval behaviors remain unstable, often exhibiting over and under retrieval that degrades both efficiency and reliability. A controlled comparison under matched parameter scales shows that RLMs consistently outperform conventional LLMs across QA benchmarks, with particularly pronounced gains on multi-hop tasks. Further analysis indicates that RLMs tend to repeatedly verify retrieved evidence and reconcile it with internal knowledge, which helps reduce hallucinations but may also trigger confirmation heavy overthinking when internal knowledge is uncertain or external evidence is insufficient or conflicting, leading to redundant retrieval. To address this issue, we propose a reinforcement learning framework centered on a multi-stage reward design that explicitly couples answer quality rewards with the number of retrieval steps. Through staged optimization, the model learns when to retrieve and how many retrievals are necessary, balancing correctness and retrieval cost. Experiments across multiple benchmarks demonstrate that our approach improves average F1 and accuracy over prompt based and prior reinforcement learning baselines while significantly reducing the average number of retrievals, achieving a better performance efficiency trade-off.
Paper Type: Long
Research Area: Retrieval-Augmented Language Models
Research Area Keywords: NLP Applications,Efficient/Low-Resource Methods for NLP
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 1427
Loading