You Don't Have to Be the One Doing Evil! Locate a Scapegoat within Multi-Agent Systems for Executing Covert Attacks

20 Sept 2025 (modified: 03 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Systems, LLM-based Agent Safety, Covert Attacks
Abstract: While LLM-based Multi-Agent Systems (MAS) excel at complex tasks through collaboration, their interactive process introduces security vulnerabilities. Existing research has shown that malicious agents can disrupt task execution by injecting prompts or manipulating the shared memory pool. However, these methods often require the malicious agent to perform direct and traceable actions, making them susceptible to detection by auditors through log reviews. Unlike prior approaches, this paper explores a more covert attack strategy, namely the \textit{Scapegoating} attack, where the malicious agent induces downstream agents in the MAS to output content that sabotages task completion, rather than directly performing the sabotage itself. To conduct such a covert attack, we further design SGoatMAS, a systemic chain poisoning framework. The core idea is to view the entire multi-agent workflow as an integrated "System-level Chain of Thought" and propose a new metric Vulnerability-to-Risk ratio to identify both the most vulnerable link and the most plausible scapegoat agent in the chain. Then, we propose conducting attacks on the selected link and agent. Extensive experiments demonstrate that SGoatMAS not only achieves significant attack performance but also maintains exceptionally low detection rates. This work points to a new direction for future research into the security and defense of Multi-Agent Systems.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 24392
Loading