Why Do Multiagent Systems Fail?

Melissa Z Pan; Mert Cemri; Lakshya A Agrawal; Shuyi Yang; Bhavya Chopra; Rishabh Tiwari; Kurt Keutzer; Aditya Parameswaran; Kannan Ramchandran; Dan Klein; Joseph E. Gonzalez; Matei Zaharia; Ion Stoica

Why Do Multiagent Systems Fail?

Melissa Z Pan, Mert Cemri, Lakshya A Agrawal, Shuyi Yang, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Kannan Ramchandran, Dan Klein, Joseph E. Gonzalez, Matei Zaharia, Ion Stoica

Published: 05 Mar 2025, Last Modified: 25 Apr 2025BuildingTrustEveryoneRevisionsBibTeXCC BY 4.0

Track: Long Paper Track (up to 9 pages)

Keywords: multi-agent systems, large language models, llm, compound ai systems, agents, ai, tool calling

TL;DR: We provide a systematic analysis for why do multiagent systems fail.

Abstract: Despite growing enthusiasm for Multi-Agent Systems (MAS), where multiple LLM agents collaborate to accomplish tasks, their performance gains across popular benchmarks remain minimal compared to single-agent frameworks. This gap highlights the need to analyze the challenges hindering MAS effectiveness. In this paper we conduct the first comprehensive study of challenges of MAS across 5 popular Multi-Agent Systems over 150+ tasks. We conduct an investigation with four expert human annotators studying the MAS execution traces, identifying 18 fine-grained failure modes, and propose a comprehensive failure taxonomy applicable across systems. We group these fine-grained failure modes into four key categories: (i) specification ambiguities and misalignment, (ii) organizational breakdowns, (iii) inter-agent conflict and coordination gaps, and (iv) weak verification and quality control. To understand whether these failure modes could have easily been avoided, we propose two interventions: improved agents roles specification and orchestration strategies. We find that identified failures require more involved solutions and we outline a roadmap for future research in this space. To contribute towards better development of MAS, we will open source our dataset, including the agent conversation traces and human annotations.

Submission Number: 70

Loading