It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: web agents, browser agents, AI agents, benchmark, safety, security, evaluation, prompt injections, text injections
Abstract: Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them vulnerable to prompt injection attacks: adversarial instructions embedded in interface elements that persuade agents to divert from their intended tasks. We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation suite for studying how persuasion techniques misguide autonomous web agents on realistic tasks. Across six frontier models, agents are susceptible to prompt injection in 25\% of tasks on average (13\% for GPT-5 to 43\% for DeepSeek-R1). Small interface or contextual changes can double the attack success rates, revealing systemic, psychologically driven vulnerabilities in web-based agents. We also provide a modular social-engineering injection framework and conduct controlled experiments on high-fidelity website clones, enabling future expansion of the benchmark.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 289
Loading