Keywords: multi-agent debate; mathematical reasoning;
TL;DR: A novel way to conduct multi-agent debate using ranking
Abstract: Recent large language models (LLMs) are trained on diverse corpora and tasks, leading them to develop complementary strengths. Multi-agent debate (MAD) has emerged as a popular way to leverage these strengths for robust reasoning, though it has mostly been applied to language-only tasks, leaving its efficacy on multimodal problems underexplored. In this paper, we study MAD for multimodal reasoning. Our setup enables generalizing the debate protocol with heterogeneous experts that possess single- and multi-modal capabilities. We evaluate our method on several mathematical and visual reasoning datasets. Our results show that our model consistently improves accuracy by over state-of-the-art MAD setups and aggregation methods across diverse tasks and LLM configurations.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 9887
Loading