Terrarium: Revisiting the Blackboard for Studying Multi-Agent Attacks

ICLR 2026 Conference Submission13619 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Framework, Benchmark, Multi-agent systems, Blackboards, Prompt Injection, Contextual Integrity, LLM Attacks
Abstract: A multi-agent system (MAS) powered by large language models (LLMs) could automate tedious user tasks like meeting scheduling that require collaboration. LLMs enable more nuanced protocols accounting for unstructured private data and users' personal constraints. However, this design exposes these systems to new problems from misalignment to attacks by malicious parties that compromise agents or steal user data. In this paper, we propose a Terrarium framework for fine-grained studies on safety, privacy, and security. We repurpose the blackboard design, an early approach in multi-agent systems, to create a modular and configurable playground to support multi-agent collaborative tasks using LLMs. We, then, identify key attacks vectors like misalignment, malicious agents, compromised communication, and poisoned data. We implement three scenarios and add four representative attacks, demonstrating the flexibility of our framework. Terrarium provides necessary tools to study and quickly iterate over new designs and can further advance our community's efforts towards trustworthy multi-agent systems.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 13619
Loading