Keywords: AI Safety, Multi-Agent Systems, Human-AI Interaction, Social Simulation
TL;DR: We built a sandbox simulation system to evaluate AI agent safety issues in a multi-turn setting.
Abstract: To address the growing safety risks as AI agents become increasingly autonomous in their interactions with human users and environments, we present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions. HAICOSYSTEM features a modular sandbox environment that simulates multi-turn interactions between users and AI agents. We then develop a comprehensive multi-dimensional evaluation framework that uses metrics covering operational, content-related, societal, and legal risks to examine the safety of AI agents in these interactions. Through running over 8K simulations based on 132 scenarios across seven domains (e.g., healthcare, finance, education), we show that state-of-the-art LLMs exhibit safety risks in 62% of cases, particularly during tool use with malicious users, highlighting the importance of evaluating and addressing AI agent safety in dynamic human-AI-environment interactions.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1450
Loading