Untrusted Content Masking for Web Agents with Security Guarantees
Keywords: agents, security, prompt injections, web agents
TL;DR: Securing web agents from prompt injection by masking third-party content based on DOM structure, achieving provable security while preserving high task utility.
Abstract: Defenses that provide security guarantees against prompt injection attacks require strict isolation between an agent's task planning and data processing capabilities. This prevents third-party content from overwriting trusted instructions. In text-based environments such as tool-use APIs, agents can plan from interface definitions without ever processing untrusted data. Web agents, however, face a fundamental challenge: they must observe the rendered page to perceive their environment, but that page already contains untrusted third-party content. In this paper, we present Untrusted Content Masking, a simple, effective approach that enables web agents to observe their environment and plan without directly processing untrusted content. We leverage a key structural insight: a webpage's Document Object Model (DOM) structure alone suffices to identify untrusted regions. Our framework exploits this by redacting such regions before they reach the agent, and restricting interaction to a sandboxed interface with strict privilege separation.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 170
Loading