Abstract: Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) but often falter when initial search queries fail. Existing approaches typically focus on query formulation or reasoning over results, lacking mechanisms to explicitly encourage persistence after a search failure. We introduce ReZero (Retry-Zero), a novel framework employing Group Relative Policy Optimization (GRPO) reinforcement learning to address this. ReZero incorporates a specific reward component, \texttt{reward\_retry}, that directly incentivizes the LLM to retry search queries following an unsuccessful initial attempt, conditional on successful final answer generation. Experiments on the Apollo 3 mission dataset demonstrate ReZero's effectiveness: it achieved a peak accuracy of 46.88\%, significantly outperforming a 25.00\% baseline trained without the retry incentive. This highlights that rewarding persistence enhances LLM robustness in information-seeking scenarios where initial queries may prove insufficient.
Paper Type: Short
Research Area: Information Retrieval and Text Mining
Research Area Keywords: passage retrieval, document representation, reinforcement learning, retrieval-augmented generation, fine-tuning
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 75
Loading