Keywords: Agents, Redteaming, Evaluations, Reasoning, Foundation Models
TL;DR: Dark patterns are a category of adversarial attack that de
Abstract: Deceptive UI designs, commonly known as dark patterns, manipulate users into
performing actions misaligned with their goals. In this paper, we show that dark
patterns are highly effective in altering web agent behavior, posing a significant
risk given the wide applications of web agents. To quantify this risk, we introduce
DECEPTICON, an environment for testing individual dark patterns in isolation.
DECEPTICON includes 850 web navigation tasks with dark patterns—600 generated
tasks and 250 real-world tasks, designed to evaluate both task success and
dark pattern effectiveness. Testing frontier large language models and state-of-
the-art agent scaffolds, we find dark patterns succeed in 70% of tested generated
and real-world tasks. Moreover, the effectiveness correlates positively with model
size and test-time reasoning, making larger, more capable models more susceptible.
Leading defense methods, including in-context prompting and multi-agent
verification, fail to consistently reduce dark pattern success. Our findings reveal
dark patterns as a latent, unmitigated risk to web agents, highlighting the urgent
need for robust defenses against manipulative designs.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 22655
Loading