MASpi: A Unified Environment for Evaluating Prompt Injection Robustness in LLM-Based Multi-Agent Systems

Hengyu An; Minxi Li; Jinghuai Zhang; Naen Xu; Chunyi Zhou; Changjiang Li; Tianyu Du; Shouling Ji

MASpi: A Unified Environment for Evaluating Prompt Injection Robustness in LLM-Based Multi-Agent Systems

Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu, Chunyi Zhou, Changjiang Li, Tianyu Du, Shouling Ji

02 Sept 2025 (modified: 22 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Multi-Agent Systems, Prompt Injection, Benchmark

Abstract: LLM-based Multi-Agent Systems (LLM-MAS) leverage inter-agent collaboration to tackle complex tasks, yet the dense interactions among agents also make them vulnerable to prompt injection attacks. Such attacks often originate from a few compromised agents and rapidly propagate across the system, posing significant security threats. Existing studies mainly focus on a limited set of attack strategies and rely on researcher-specific implementations of LLM-MAS, which makes it difficult to adapt attacks across different systems and hinders comprehensive evaluation. To bridge this gap, we introduce MASpi, a unified environment for evaluating the prompt injection robustness of LLM-MAS. MASpi offers systematic evaluation suites spanning multiple attack surfaces (i.e., external inputs, agent profiles, inter-agent messages) and attack objectives (i.e., instruction hijacking, task disruption, information disclosure). Specifically, MASpi provides interfaces for executing 23 prompt injection attacks tailored to LLM-MAS. Its modular design enables researchers to easily integrate new LLM-MAS approaches and develop novel attack strategies on top of it. Our benchmarking results reveal that increasing the topological complexity of LLM-MAS does not guarantee security. Instead, the risks are distributed across agents, with the most harmful agent varying depending on the specific attack objective. Moreover, defenses designed for single-agent prompt injection do not reliably transfer to LLM-MAS; in fact, narrowly scoped defenses may inadvertently increase vulnerabilities to other types of attacks. MASpi aims to provide a solid foundation for the community to advance deeper exploration of security design principles in LLM-MAS.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 887

Loading