Why Do Multiagent Systems Fail?

Published: 05 Mar 2025, Last Modified: 25 Apr 2025BuildingTrustEveryoneRevisionsBibTeXCC BY 4.0
Track: Long Paper Track (up to 9 pages)
Keywords: multi-agent systems, large language models, llm, compound ai systems, agents, ai, tool calling
TL;DR: We provide a systematic analysis for why do multiagent systems fail.
Abstract:

Despite growing enthusiasm for Multi-Agent Systems (MAS), where multiple LLM agents collaborate to accomplish tasks, their performance gains across popular benchmarks remain minimal compared to single-agent frameworks. This gap highlights the need to analyze the challenges hindering MAS effectiveness. In this paper we conduct the first comprehensive study of challenges of MAS across 5 popular Multi-Agent Systems over 150+ tasks. We conduct an investigation with four expert human annotators studying the MAS execution traces, identifying 18 fine-grained failure modes, and propose a comprehensive failure taxonomy applicable across systems. We group these fine-grained failure modes into four key categories: (i) specification ambiguities and misalignment, (ii) organizational breakdowns, (iii) inter-agent conflict and coordination gaps, and (iv) weak verification and quality control. To understand whether these failure modes could have easily been avoided, we propose two interventions: improved agents roles specification and orchestration strategies. We find that identified failures require more involved solutions and we outline a roadmap for future research in this space. To contribute towards better development of MAS, we will open source our dataset, including the agent conversation traces and human annotations.

Submission Number: 70
Loading