Learning to Think in Blocks: A Prior-Guided Reinforcement Learning Framework for RAG

Wangyi Mei; Ganlin Xu; Zhoujia Zhang; Jiaqing Liang; Weijia Lu; Jinhui Huang; Chen Yang; xiaodong Zhang; ZHIFEI YANG; Xiaofeng Ma; Deqing Yang

Learning to Think in Blocks: A Prior-Guided Reinforcement Learning Framework for RAG

Wangyi Mei, Ganlin Xu, Zhoujia Zhang, Jiaqing Liang, Weijia Lu, Jinhui Huang, Chen Yang, xiaodong Zhang, ZHIFEI YANG, Xiaofeng Ma, Deqing Yang

20 Sept 2025 (modified: 19 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation, Reinforcement Learning, Prior-Guided Learning, Structured Action Space, Query Rewriting

Abstract: Retrieval-Augmented Generation (RAG) systems mitigate factual inaccuracies in large language models (LLMs) by integrating external knowledge, but their effectiveness often hinges on query rewriting techniques. Prompt-based rewriting methods are frequently suboptimal, while existing reinforcement learning (RL) approaches struggle with inefficient, unguided exploration of the vast strategy space. To address these limitations, we propose an end-to-end RL framework that initializes the training process with human-defined prior rewriting strategies, enabling the model to learn from its interactions with the RAG environment and develop its own effective posterior rewriting strategies. Furthermore, we develop a novel RL algorithm, namely Block-wise Geometric Policy Optimization (BGPO), which resolves the granularity mismatch in previous methods by redefining the state-action space as blocks of tokens. This algorithm is enhanced by geometric averaging for balanced importance and a Bellman-equation-inspired credit assignment mechanism to reshape the reward. Extensive experiments on both local corpus retrieval and online search datasets demonstrate that our RL framework consistently surpasses the baselines, validating its superiority for complex RAG tasks. Our project code can be found at this anonymous repository: https://anonymous.4open.science/r/Learning-to-Think-in-Blocks-A-Prior-Guided-Reinforcement-Learning-Framework-for-RAG-0288/

Primary Area: foundation or frontier models, including LLMs

Submission Number: 24440

Loading