Keywords: Retrieval-Augmented Generation, Reinforcement Learning, Prior-Guided Learning, Structured Action Space, Query Rewriting
Abstract: Retrieval-Augmented Generation (RAG) systems mitigate factual inaccuracies in large language models (LLMs) by integrating external knowledge, but their effectiveness often hinges on query rewriting techniques. Prompt-based rewriting methods are frequently suboptimal, while existing reinforcement learning (RL) approaches struggle with inefficient, unguided exploration of the vast strategy space. To address these limitations, we propose an end-to-end RL framework that initializes the training process with human-defined prior rewriting strategies, enabling the model to learn from its interactions with the RAG environment and develop its own effective posterior rewriting strategies. Furthermore, we develop a novel RL algorithm, namely Block-wise Geometric Policy Optimization (BGPO), which resolves the granularity mismatch in previous methods by redefining the state-action space as blocks of tokens. This algorithm is enhanced by geometric averaging for balanced importance and a Bellman-equation-inspired credit assignment mechanism to reshape the reward. Extensive experiments on both local corpus retrieval and online search datasets demonstrate that our RL framework consistently surpasses the baselines, validating its superiority for complex RAG tasks. Our project code can be found at this anonymous repository: https://anonymous.4open.science/r/Learning-to-Think-in-Blocks-A-Prior-Guided-Reinforcement-Learning-Framework-for-RAG-0288/
Primary Area: foundation or frontier models, including LLMs
Submission Number: 24440
Loading