VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents

VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents

ICLR 2026 Conference Submission19136 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Web Agent, Attack, Computer Use-Agent, Browser-Use Agent, Dataset, Benchmark

TL;DR: We introduce VPI-Bench, a benchmark demonstrating that Visual Prompt Injection can manipulate Computer-Use and Browser-Use Agents with success rates up to 51% and 100%, underscoring the need for robust defenses.

Abstract: Computer-Use Agents (CUAs) with full system access enable powerful task automation but pose significant security and privacy risks due to their ability to manipulate files, access user data, and execute arbitrary commands. While prior work has focused on browser-based agents and HTML-level attacks, the vulnerabilities of CUAs remain underexplored. In this paper, we propose an end-to-end threat model where Visual Prompt Injection (VPI) manipulates CUAs in black-box settings to perform unauthorized actions or leak sensitive information, capturing the entire attack chain from injection to harmful outcomes. Then, we propose VPI-Bench, a benchmark of 306 test cases across five widely used platforms, to evaluate agent robustness under VPI threats. Each test case is a variant of a web platform, designed to be interactive, deployed in a realistic environment, and containing a visually embedded malicious prompt. Our empirical study shows that current CUAs and BUAs can be deceived at rates of up to 51\% and 100\%, respectively, on certain platforms. The experimental results also indicate that existing defense methods offer only limited improvements. These findings highlight the need for robust, context-aware defenses to ensure the safe deployment of multimodal AI agents in real-world environments.

Primary Area: datasets and benchmarks

Submission Number: 19136

Loading