It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

ICLR 2026 Conference Submission22043 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: web agents, browser agents, agent safety, agent hijacks, agent benchmark, prompt injections, text injections

Abstract: Web-based agents powered by Large Language Models are increasingly used for tasks such as email management or professional networking. However, their reliance on web content makes them vulnerable to hijacking attacks: adversarial instructions hidden in ordinary interface elements that divert the agent from its assigned task. To effectively measure the risks of such attacks, we introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP). TRAP makes three contributions. First, it provides a flexible framework for generating adversarial injections, combining five modular dimensions. Second, it delivers a benchmark of 630 task suites on realistic website clones to measure agent susceptibility. Third, it introduces an objective one-click hijack evaluation method that avoids reliance on LLM judges and reduces ambiguity from agent skill gaps. We evaluate six frontier models on TRAP and find that agents are hijacked in 25\% of cases on average, with hijack success rates ranging from 13\% on GPT-5 to 43\% on DeepSeek-R1. We find that small design choices, such as using buttons instead of hyperlinks or lightly tailoring attacks to the environment, can multiply success rates. Moreover, effective hijacks often transfer across models, revealing systemic vulnerabilities. By releasing TRAP, we provide a reproducible, modular and extensible benchmark for systematically evaluating hijacking risks in web-based agents.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 22043

Loading