Submission Track: Paper Track (up to 8 pages)
Keywords: Data poisoning, backdoor attack, fine-tuning
TL;DR: AI agents are vulnerable to data poisoning during fine-tuning, where injecting just 5% malicious traces can embed stealthy backdoors that leak confidential user information.
Abstract: The rise of AI agents that can use tools, browse the web and interact with computers on behalf of a user, has sparked strong interest in improving these capabilities by explicitly fine-tuning the LLMs/VLMs that power these agents. Several researchers have proposed collecting data by letting the agents interact with their environment (e.g., a computer operating system, the web or a collection of APIs exposed as tools), and improve agent performance by fine tuning on this data. In this work, we show that such data collection can be manipulated by adversaries to insert poisoned traces. By modifying just 5% of collected traces, adversaries can embed stealthy bad behaviors into agents—like leaking confidential user information whenever the tool or webpage exposes a trigger. Our results raise important security concerns in the development of AI agents, and underscore the importance of careful scrutiny of all data collection processes used to improve agentic AI.
Submission Number: 26
Loading