Learning When to Search: Multi-Stage Reward Design for Efficient Agentic RAG with Reasoning Language Models

Learning When to Search: Multi-Stage Reward Design for Efficient Agentic RAG with Reasoning Language Models

ACL ARR 2026 January Submission1427 Authors

29 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Under Retrieval, Reinforcement Learning, Performance–Efficiency Trade-off

Abstract: Reasoning Language Models (RLMs) provide strong step-by-step reasoning and evidence integration for agentic Retrieval-Augmented Generation (RAG), yet their retrieval behaviors remain unstable, often exhibiting over and under retrieval that degrades both efficiency and reliability. A controlled comparison under matched parameter scales shows that RLMs consistently outperform conventional LLMs across QA benchmarks, with particularly pronounced gains on multi-hop tasks. Further analysis indicates that RLMs tend to repeatedly verify retrieved evidence and reconcile it with internal knowledge, which helps reduce hallucinations but may also trigger confirmation heavy overthinking when internal knowledge is uncertain or external evidence is insufficient or conflicting, leading to redundant retrieval. To address this issue, we propose a reinforcement learning framework centered on a multi-stage reward design that explicitly couples answer quality rewards with the number of retrieval steps. Through staged optimization, the model learns when to retrieve and how many retrievals are necessary, balancing correctness and retrieval cost. Experiments across multiple benchmarks demonstrate that our approach improves average F1 and accuracy over prompt based and prior reinforcement learning baselines while significantly reducing the average number of retrievals, achieving a better performance efficiency trade-off.

Paper Type: Long

Research Area: Retrieval-Augmented Language Models

Research Area Keywords: NLP Applications,Efficient/Low-Resource Methods for NLP

Contribution Types: Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 1427

Loading