Track: Long Paper Track (up to 9 pages)
Keywords: Language Agents, Benchmarks, Web Agents, AI Safety, Robustness
TL;DR: A benchmark and environment that stress tests the robustness of web agents in realistic online environments.
Abstract: Recent advances in language model (LM) agents and tool calling have enabled autonomous, iterative systems to emulate digital behavior in a variety of environments. In order to better understand the instruction following limitations of LM agents, we introduce WebGauntlet, a benchmark that stress tests the robustness of web agents in realistic online environments. Our environment replicates online e-commerce settings for agents to traverse and perform simple tasks for users. Our threat model concretizes dozens of environment-side attacks and finds that LM agents struggle to traverse past simple adversarial content, where our strongest threats average an attack success rate (ASR) of 98.92%. We analyze trajectories to explore the failures of web agents and better understand vision-language model (VLM) limitations. WebGauntlet supports the study agent safety, demonstrating the gaps in performance between a spectrum of adversarial and safe environments.
Submission Number: 56
Loading