Training Language Models to Critique With Multi-agent Feedback

Training Language Models to Critique With Multi-agent Feedback

ACL ARR 2025 February Submission1188 Authors

13 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Critique ability, a meta-cognitive capability of humans, presents significant challenges for LLMs to improve. While utilizing human annotation can enhance critique ability effectively, most recent works primarily rely on supervised fine-tuning (SFT) using critiques generated by a single LLM like GPT-4, which is more scalable and cost-effective. However, such model-generated critiques often suffer from inherent flaws due to the complexity of critique. Consequently, fine-tuning LLMs on these flawed critiques not only limits performance but also propagates errors into the learned model. To address this issue, we propose \textbf{MultiCritique}, a unified data generation framework that leverages multi-agent feedback to improve critique ability in both the supervised fine-tuning (SFT) and reinforcement learning (RL) stages. In the SFT stage, MultiCritique aggregates high-quality multi-agent critiques through a fine-grained meta-critique mechanism. In the RL stage, preference critiques are constructed and refined by validating their contributions to multi-agent revisions, thereby enhancing RL in improving critique ability. Based on MultiCritique, we construct SFT and RL datasets. Extensive experimental results on two benchmarks highlight the key benefits of our dataset, including superior quality, enhanced data efficiency, strong generalization on unseen tasks, and improvements in the general capability of LLMs. Notably, our fine-tuned 7B model significantly surpasses advanced 7B-13B models, approaching advanced 70B LLMs and GPT-4. Codes and datasets will be publicly available.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Large Language Models, Critique Ability, Multi-agent framework

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 1188

Loading