Can a Single-Agent Manipulate the Collective Decisions of Multi-Agents?

Can a Single-Agent Manipulate the Collective Decisions of Multi-Agents?

ACL ARR 2025 February Submission4069 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Individual Large Language Models (LLMs) have demonstrated significant capabilities across various domains, such as healthcare and law. Recent studies also show that coordinated multi-agent systems exhibit enhanced decision-making and reasoning abilities through collaboration. However, due to the vulnerabilities of individual LLMs and the difficulty of accessing all agents in a multi-agent system, a key question arises: Can an attacker manipulate the collective decision of a multi-agent system by accessing only a single agent? To explore this question, we formulate it as a game with incomplete information, where an attacker knows only one agent and lacks full knowledge of the other agents in the system. With this formulation, we propose M-Spoiler, a framework that simulates agent interactions within a multi-agent system to generate adversarial samples. These samples are then used to attack the target system, misleading its collaborative decision-making process. More specifically, M-Spoiler introduces a stubborn agent that actively optimizes adversarial samples by simulating potential stubborn responses from agents in the target system. This enhances the effectiveness of the generated adversarial samples in misleading the system. Through extensive experiments across various tasks, our findings confirm the risks posed by the knowledge of a single agent in multi-agent systems and demonstrate the effectiveness of our framework. Besides, we explore several defense mechanisms, showing that our proposed attack framework remains more potent than baselines, underscoring the need for further research into defensive strategies.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: AI Safety, Multi-Agent

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 4069

Loading