Keywords: Privacy, Safety, VLM web agents, Adversarial examples, Web agents, Vision-Language Model
Abstract: Recent progress in generalist web agents built on large multimodal models has enabled automation of complex web tasks but also created new security risks. We identify a new attack vector against web agents that does not require manipulating HTML elements, unlike prior work. Our threat model focuses on marketplace websites, a primary target of generalist web agents, where users and sellers can upload images themselves. We propose AGENTCON, a practical attack that crafts adversarial perturbations on listing images, rather than perturbing the entire input as in traditional adversarial attacks, to induce the intended target action by web agents. AGENTCON incorporates real-world constraints from webpage rendering into the optimization so that the attack remains effective when neighboring listings and the attack image’s position vary. Our evaluation on 1,680 tasks against a state-of-the-art web agent framework demonstrates the effectiveness of AGENTCON, with an attack success rate (ASR) of 80.4% on average across four application scenarios and three agent models. AGENTCON is also resilient to common countermeasures, achieving an ASR of 76% on average.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 13176
Loading