How Dark Patterns Manipulate Web Agents

How Dark Patterns Manipulate Web Agents

ICLR 2026 Conference Submission22655 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agents, Redteaming, Evaluations, Reasoning, Foundation Models

TL;DR: Dark patterns are a category of adversarial attack that de

Abstract: Deceptive UI designs, commonly known as dark patterns, manipulate users into performing actions misaligned with their goals. In this paper, we show that dark patterns are highly effective in altering web agent behavior, posing a significant risk given the wide applications of web agents. To quantify this risk, we introduce DECEPTICON, an environment for testing individual dark patterns in isolation. DECEPTICON includes 850 web navigation tasks with dark patterns—600 generated tasks and 250 real-world tasks, designed to evaluate both task success and dark pattern effectiveness. Testing frontier large language models and state-of- the-art agent scaffolds, we find dark patterns succeed in 70% of tested generated and real-world tasks. Moreover, the effectiveness correlates positively with model size and test-time reasoning, making larger, more capable models more susceptible. Leading defense methods, including in-context prompting and multi-agent verification, fail to consistently reduce dark pattern success. Our findings reveal dark patterns as a latent, unmitigated risk to web agents, highlighting the urgent need for robust defenses against manipulative designs.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 22655

Loading