WebGauntlet: Measuring Instruction Following and Robustness for Web Agents

WebGauntlet: Measuring Instruction Following and Robustness for Web Agents

ICLR 2025 Workshop BuildingTrust Submission56 Authors

10 Feb 2025 (modified: 06 Mar 2025)Submitted to BuildingTrustEveryoneRevisionsBibTeXCC BY 4.0

Track: Long Paper Track (up to 9 pages)

Keywords: Language Agents, Benchmarks, Web Agents, AI Safety, Robustness

TL;DR: A benchmark and environment that stress tests the robustness of web agents in realistic online environments.

Abstract: Recent advances in language model (LM) agents and tool calling have enabled autonomous, iterative systems to emulate digital behavior in a variety of environments. In order to better understand the instruction following limitations of LM agents, we introduce WebGauntlet, a benchmark that stress tests the robustness of web agents in realistic online environments. Our environment replicates online e-commerce settings for agents to traverse and perform simple tasks for users. Our threat model concretizes dozens of environment-side attacks and finds that LM agents struggle to traverse past simple adversarial content, where our strongest threats average an attack success rate (ASR) of 98.92%. We analyze trajectories to explore the failures of web agents and better understand vision-language model (VLM) limitations. WebGauntlet supports the study agent safety, demonstrating the gaps in performance between a spectrum of adversarial and safe environments.

Submission Number: 56

Loading