Sabotage from Within: Analyzing the Vulnerability of LLM Multi-Agent Systems to Infiltration

Published: 10 Jan 2026, Last Modified: 10 Jan 2026LaMAS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM-based Multi-Agent Systems, Adversarial Infiltration, Red Teaming, System Security, Cooperative AI, Trust and Safety, Multi-Agent Coordination
TL;DR: We expose critical security vulnerabilities in LLM multi-agent systems through adversarial infiltration, demonstrating optimized attacks achieve 82% success with 69% stealth, and evaluating effective but costly defense mechanisms.
Abstract: LLM-based Multi-Agent Systems (LaMAS) represent a paradigm shift in complex problem-solving, yet their security implications remain largely unexplored. This paper investigates a critical vulnerability: adversarial infiltration within cooperative agent teams. We propose a novel red-teaming framework where a strategically optimized adversarial agent infiltrates LaMAS to sabotage collaborative tasks while evading detection. Through comprehensive experiments on TruthfulQA and ToxiGen benchmarks, we demonstrate that our method achieves 82% success rate in misinformation attacks with only 31% detection rate, significantly outperforming traditional red-teaming approaches. Our analysis reveals that managerial roles pose the greatest threat when compromised, and that larger teams exhibit increased vulnerability despite their complexity. We evaluate multiple defense mechanisms, finding that combined approaches can mitigate 78% of attacks but incur 84% overhead. These findings highlight the urgent need for robust security frameworks in LaMAS deployments and establish foundational principles for responsible multi-agent system design.
Submission Number: 18
Loading