Keywords: Large Language Models, Multi-agent reasoning, Answer verification, Efficiency and scalability, Prompting techniques
TL;DR: We propose MARS, a multi-agent collaboration framework that achieves the reasoning quality of multi-agent debate while cutting resource consumption by ~50%.
Abstract: Large language models (LLMs) have achieved impressive results in natural language understanding, yet their reasoning capabilities remain limited when operating as single agents. Multi-Agent Debate (MAD) has been proposed to address this limitation by enabling collaborative reasoning among multiple models in a round-table debate manner. While effective, MAD introduces substantial computational overhead due to the number of agents involved and the frequent communication required. In this paper, we propose MARS (Multi-Agent Review System), a role-based collaboration framework inspired by the review process. In MARS, an author agent generates an initial solution, reviewer agents provide decisions and comments independently, and a meta-reviewer integrates the feedback to make the final decision and guide further revision. This design enhances reasoning quality while avoiding costly reviewer-to-reviewer interactions, thereby controlling token consumption and inference time. We compared MARS with both MAD and other state-of-the-art reasoning strategies across multiple benchmarks. Extensive experiments with different LLMs show that MARS matches the accuracy of MAD while reducing both token usage and inference time by approximately 50%. Code is available at https://anonymous.4open.science/ r/ICLR2026-submit-F7B0/README.md.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 14542
Loading