Keywords: Framework, Benchmark, Multi-agent systems, Blackboards, Prompt Injection, Contextual Integrity, LLM Attacks
Abstract: A multi-agent system (MAS) powered by large language models (LLMs) could automate tedious user tasks like meeting scheduling that require collaboration. LLMs enable more nuanced protocols accounting for unstructured private data and users' personal constraints.
However, this design exposes these systems to new problems
from misalignment to attacks by malicious parties that compromise agents or steal user data.
In this paper, we propose a Terrarium framework for fine-grained
studies on safety, privacy, and security.
We repurpose the blackboard design,
an early approach in multi-agent systems, to create a modular and configurable playground to support
multi-agent collaborative tasks using LLMs.
We, then, identify key attacks
vectors like misalignment, malicious agents, compromised communication, and
poisoned data.
We implement three scenarios and add four representative attacks, demonstrating the flexibility of our framework.
Terrarium provides necessary tools to study and quickly iterate over new designs
and can further advance our community's efforts towards trustworthy multi-agent systems.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 13619
Loading