HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents

Xuhui Zhou; Hyunwoo Kim; Faeze Brahman; Liwei Jiang; Hao Zhu; Ximing Lu; Frank F. Xu; Bill Yuchen Lin; Yejin Choi; Niloofar Mireshghallah; Ronan Le Bras; Maarten Sap

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents

Xuhui Zhou, Hyunwoo Kim, Faeze Brahman, Liwei Jiang, Hao Zhu, Ximing Lu, Frank F. Xu, Bill Yuchen Lin, Yejin Choi, Niloofar Mireshghallah, Ronan Le Bras, Maarten Sap

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI Safety, Multi-Agent Systems, Human-AI Interaction, Social Simulation

TL;DR: We built a sandbox simulation system to evaluate AI agent safety issues in a multi-turn setting.

Abstract: To address the growing safety risks as AI agents become increasingly autonomous in their interactions with human users and environments, we present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions. HAICOSYSTEM features a modular sandbox environment that simulates multi-turn interactions between users and AI agents. We then develop a comprehensive multi-dimensional evaluation framework that uses metrics covering operational, content-related, societal, and legal risks to examine the safety of AI agents in these interactions. Through running over 8K simulations based on 132 scenarios across seven domains (e.g., healthcare, finance, education), we show that state-of-the-art LLMs exhibit safety risks in 62% of cases, particularly during tool use with malicious users, highlighting the importance of evaluating and addressing AI agent safety in dynamic human-AI-environment interactions.

Supplementary Material: zip

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 1450

Loading