Cracking the Collective Mind: Adversarial Manipulation in Multi-Agent Systems

Fengyuan Liu; Rui Zhao; Guohao Li; Philip Torr; Lei Han; Jindong Gu

Cracking the Collective Mind: Adversarial Manipulation in Multi-Agent Systems

Fengyuan Liu, Rui Zhao, Guohao Li, Philip Torr, Lei Han, Jindong Gu

27 Sept 2024 (modified: 06 Mar 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent, AI Safety

Abstract: Large Language Models (LLMs) have demonstrated significant capabilities across various domains such as healthcare, weather forecasting, finance, and law. These works have showcased the powerful abilities of individual LLMs. Recently, numerous studies have shown that coordinated multi-agent systems exhibit enhanced decision-making and reasoning capabilities through collaboration. However, since individual LLMs are susceptible to various adversarial attacks, a key vulnerability arises: Can an attacker manipulate the collective decision of such systems by accessing a single agent? To address this issue, we formulate it as a game with incomplete information, where agents lack full knowledge of adversarial strategies. We then propose a framework, M-Spoiler, which simulates a stubborn adversary in multi-agent debates during the training phase to tackle this problem. Through extensive experiments across various tasks, our findings confirm the risk of manipulation in multi-agent systems and demonstrate the effectiveness of our attack strategies. Additionally, we explore several defense mechanisms, revealing that our proposed attack method remains more potent than existing baselines, underscoring the need for further research on defensive strategies.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10646

Loading