Abstract: Large language models (LLMs) have demonstrated remarkable evaluation and critique capabilities, providing insightful feedback and identifying flaws in various tasks. These critique abilities have shown great potential in improving the performance of LLMs. However, limited research has explored which types of critiques are most effective for improving model responses or how to generate such critiques. To address this gap, we introduce Refinement-oriented Critique Optimization (RCO), a novel framework designed to train critic models using refinement signals. By evaluating refinement performance, RCO identifies effective critique strategies for improving model outputs and learns to generate these critiques. Extensive experiments demonstrate that RCO significantly outperforms conventional LLM-generated critiques in refining responses. Notably, RCO not only enhances the policy model used during training but also exhibits strong transferability, effectively aiding other models in response refinement. Our code and data will be publicly available upon acceptance of this paper.
Paper Type: Long
Research Area: Generation
Research Area Keywords: Generation, Language Modeling
Contribution Types: Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 1406
Loading