ReZero: Enhancing LLM search ability by trying one-more-time

ReZero: Enhancing LLM search ability by trying one-more-time

ACL ARR 2025 May Submission75 Authors

07 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) but often falter when initial search queries fail. Existing approaches typically focus on query formulation or reasoning over results, lacking mechanisms to explicitly encourage persistence after a search failure. We introduce ReZero (Retry-Zero), a novel framework employing Group Relative Policy Optimization (GRPO) reinforcement learning to address this. ReZero incorporates a specific reward component, \texttt{reward\_retry}, that directly incentivizes the LLM to retry search queries following an unsuccessful initial attempt, conditional on successful final answer generation. Experiments on the Apollo 3 mission dataset demonstrate ReZero's effectiveness: it achieved a peak accuracy of 46.88\%, significantly outperforming a 25.00\% baseline trained without the retry incentive. This highlights that rewarding persistence enhances LLM robustness in information-seeking scenarios where initial queries may prove insufficient.

Paper Type: Short

Research Area: Information Retrieval and Text Mining

Research Area Keywords: passage retrieval, document representation, reinforcement learning, retrieval-augmented generation, fine-tuning

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 75

Loading