AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems

24 Sept 2024 (modified: 25 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM-Agents, Multi-Agents, Large Language Models
TL;DR: AgentMonitor is a framework that captures agent-level inputs and outputs to generate statistics for training a regression model predicting task performance, while also enabling real-time corrections to mitigate security risks from malicious agents.
Abstract: Rapid advancement of large language models (LLMs) has catalyzed the emergence of LLM-based agents. Recent research has shifted from single-agent systems to multi-agent frameworks, demonstrating that collaboration can outperform the capabilities of individual LLMs. However, effectively pre-configuring a Multi-Agent System (MAS) for a specific task remains a challenging problem, with performance outcomes only observable after execution. Inspired by the well-established scaling laws in LLM development that model downstream task performance or validation loss as functions of various factors during training, we seek to investigate the predictability of MAS performance. Specifically, we explore whether it is possible to predict the downstream task performance of a configured MAS. In addition, MAS face a growing challenge in ensuring reliable and trustworthy responses. The introduction of malicious agents can lead to the generation and spread of harmful content, which poses significant security risks. To address the above issues, we introduce AgentMonitor, a framework that integrates with existing MAS at the agent level. AgentMonitor captures inputs and outputs at each step; this enables (1) transforming them into relevant statistics supporting the training of a regression model to predict task performance and (2) the application of on-the-fly corrections to mitigate negative impacts on final outcomes. Extensive experiments demonstrate that training a simple XGBoost model achieves a high Spearman rank correlation of 0.89 in an in-domain setting. In more challenging scenarios, where the statistics of a specific task or architecture is absent from the training set, our method maintains a moderate average correlation of 0.58. Furthermore, by employing AgentMonitor in a maliciously configured MAS, the system ultimately generates 6.2% less harmful content and 1.8% more helpful content on average, reducing safety risks and improving reliability.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3493
Loading