Network-Level Prompt and Trait Leakage in Local Research Agents

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI Agent, Network-level leakage, Prompt recovery, Trait inference, Privacy attacks and defenses
Abstract: We show that Web and Research Agents (WRAs) are vulnerable to inference attacks by passive network observers. Deployment of WRAs \emph{locally} for privacy, legal, or financial purposes exposes them to DNS resolvers, malicious ISPs, VPNs, web proxies, and corporate or government firewalls. However, unlike sporadic and scarce web browsing by humans, WRAs visit $70{-}140$ domains per request with a distinct timing pattern, creating unique privacy risks. Specifically, we demonstrate a novel prompt and trait leakage attack that only leverages WRAs' network-level metadata. We start by building a new dataset of WRA traces based on real and synthetic user search queries. We define a behavioral metric (called OBELS) to comprehensively assess similarity between original and inferred prompts, showing that our attack recovers over 73\% of the functional and domain knowledge of user prompts and up to 19 of 32 traits in a multi-session setting. Finally, we discuss mitigation strategies that constrain domain diversity or obfuscate traces, showing negligible utility impact while reducing attack effectiveness by an average of 29\%.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 200
Loading