Position: Agent-Specific Trustworthiness Risk as a Research Priority

Zeming Wei, Tianlin Li, Xiaojun Jia, Yang Liu, Meng Sun

Published: 19 Apr 2025, Last Modified: 23 Apr 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: The rapid development of Large Language Models (LLMs) has facilitated the development of AI agents for various applications. However, ensuring the trustworthiness of these LLM-based agents, encompassing aspects such as safety, robustness, and privacy, remains a critical challenge. While existing research predominantly addresses risks inherent to LLMs, the distinct vulnerabilities introduced by agent systems' design, including their perception, action, and interaction mechanisms, are insufficiently explored. These components expand the attack surface for adversaries, amplifying risks that demand urgent research attention. In this position paper, we comprehensively analyze trustworthiness risks specific to LLM-based agents, emphasizing threats arising from agent-specific modules going beyond standalone LLMs. Specifically, we summarize these risks across six dimensions, discuss their potential mitigation strategies, and highlight gaps in current attacks and defenses. Although preliminary studies have identified some of these risks, we argue that challenges stemming from agent systems still remain underprioritized and insufficiently addressed. Based on these discussions, we advocate for more research efforts to bridge this gap, ensuring the secure and responsible deployment of LLM-based agents in real-world scenarios.