Training Language Models for Critiquing through Reinforcement Learning

Training Language Models for Critiquing through Reinforcement Learning

ACL ARR 2025 February Submission8128 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Training critique models to assess and provide feedback on model outputs is a promising way to improve large language models (LLMs) for complex reasoning tasks. However, existing approaches typically rely on stronger supervisors for annotating critique data. To address this, we propose Critique-RL, an online RL framework for developing critique models without stronger supervision. Our framework operates on a two-player paradigm: the actor generates a response, the critic provides feedback, and the actor refines the response accordingly. We first reveal that relying solely on indirect reward signals from the actor’s outputs for RL optimization often leads to unsatisfactory critics: while their helpfulness improves, the discriminability remains poor, resulting in marginal performance gains. To overcome this, Critique-RL adopts a two-stage optimization strategy. In stage I, it reinfoces the discriminability of the critic with direct rule-based reward signals; in stage II, it introduces indirect rewards based on actor refinement to improve the critic's helpfulness, while maintaining its discriminability via appropriate regularization. Extensive experiments across different tasks and models demonstrate that Critique-RL achieves significant performance improvements across different models and tasks.

Paper Type: Long

Research Area: Generation

Research Area Keywords: Critique models, LLM reasoning

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 8128

Loading