Keywords: Multi-Agent Debate, Large Language Model, Preference Optimization
TL;DR: We train a 2-agent LLM team (one actor-agent and one critic-agent) to collaboratively solve problems.
Abstract: Large language models (LLMs) have demonstrated a remarkable ability to serve as general-purpose tools on various language-based tasks.
Recent works have demonstrated that the efficacy of such models can be improved through iterative dialog between multiple models, frequently referred to as multi-agent debate (MAD).
While debate shows promise as a means of improving model efficacy, most works in this area treat debate as an emergent behavior, rather than a learned behavior.
In doing so, current debate frameworks rely on collaborative behaviors to have been sufficiently trained into off-the-shelf models.
To address this limitation, we propose ACC-Debate, an Actor-Critic based learning framework to produce a two-agent team specialized in debate.
We demonstrate that ACC-Debate outperforms SotA debate techniques on a wide array of benchmarks.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12819
Loading