Keywords: multi-agent debate, memory selection, robustness
Abstract: Large language models (LLMs) have demonstrated impressive capabilities in various language-based reasoning tasks (e.g., math reasoning). Among all LLM reasoning frameworks, _multi-agent debate_ (MAD), which employs multiple LLM agents and performs reasoning in the way of multi-round debate, has emerged as a powerful reasoning paradigm since it allows agents to access previous memories to refine their reasoning iteratively in each debate round and facilitates LLMs in alleviating the potential intrinsic self-preference bias. Although MAD improves the reasoning capabilities of LLMs significantly, in this paper, however, we theoretically demonstrate that the performance of MAD is closely related to the quality of memories. This indicates that MAD is still vulnerable to wrong reasoning memories, which poses a threat to the robustness of MAD. To address this problem, we introduce a simple yet effective multi-agent debate framework, _multi-agent debate with memory masking_ (MAD-M$^2$), to enhance the robustness of MAD by allowing LLM agents to select memories in the previous debate round before they perform reasoning in the current debate round. In this way, MAD-M$^2$ can polish the contextual information at the beginning of each debate round by preserving as many informative and meaningful memories as possible while dropping the noisy memories and, in turn, achieve better reasoning results. Extensive empirical results on several mainstream mathematical and logical reasoning benchmarks demonstrate that MAD-M$^2$ is able to achieve better results than the typical MAD.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 9459
Loading