Keywords: LLM, Multi-Agent Systems
Abstract: LLM-based multi-agent systems (MAS) extend the capabilities of single LLMs by enabling cooperation among multiple specialized agents. However, most existing MAS frameworks rely on a single LLM to drive all agents, constraining the system's intelligence to the limitations of that model. This paper explores the paradigm of heterogeneous LLM-driven MAS, aiming to elevate the system's potential to the collective intelligence of diverse LLMs. We introduce X-MAS-Bench, a comprehensive testbed designed to evaluate the performance of various LLMs across different domains and MAS-related functions. Through an extensive empirical study, we assess 28 LLMs across 5 domains (encompassing 21 test sets) and 5 functions, conducting over 1.7 million evaluations to identify optimal model selections for each domain-function combination. Building on these findings, we demonstrate how transitioning from homogeneous to heterogeneous LLM-driven MAS can significantly enhance system performance without requiring structural redesign. Specifically, in a chatbot-only MAS scenario, the heterogeneous configuration yields up to 6.4% performance improvement for MAS methods on the MATH dataset. In a mixed chatbot-reasoner scenario, the heterogeneous MAS achieves up to 47% performance boost on the AIME dataset. Our results underscore the transformative potential of heterogeneous LLMs in MAS, highlighting a promising direction for future research in scalable, collaborative AI systems.
Primary Area: datasets and benchmarks
Submission Number: 18118
Loading