PriGuardAgent: Context-Aware Privacy Guardrails for Agentic Systems

ACL ARR 2026 January Submission4009 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs agent, privacy
Abstract: The transition from Large Language Models (LLMs) to autonomous agents capable of tool execution has introduced complex, dynamic privacy risks that traditional safeguards fail to address. While existing defenses rely on static PII filters or rigid guardrail models, they often lack the contextual reasoning required to detect subtle privacy violations in agentic workflows. To bridge this gap, we introduce PriGuardAgent, an agentic privacy guardrail framework designed to proactively detect risks in autonomous systems. PriGuardAgent leverages the Model Context Protocol (MCP) to unify diverse analysis tools—such as PII detection, data minimization, and compliance checking—into a plug-and-play architecture, enabling a dynamic planner to orchestrate specialized tools tailored to the interaction context. Furthermore, we incorporate a retrieval-augmented memory module that grounds decision-making in successful past analysis trajectories, effectively balancing precision and recall. Comprehensive evaluations on the PrivacyLens benchmark demonstrate that PriGuardAgent significantly outperforms existing guard models and single-turn detection methods. Specifically, PriGuardAgent achieves an average F1 score of 0.715 across Llama3, Mistral, and Zephyr agents, surpassing prompt-engineered privacy analysis models (averaged F1 0.629) and specialized guardrails such as WildGuard (F1 0.284) and Qwen3Guard (F1 0.162). These results showcase the potential of dynamic agentic reasoning equipped workflows for safeguarding privacy in next-generation agentic applications.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: security/privacy
Contribution Types: NLP engineering experiment
Languages Studied: english
Submission Number: 4009
Loading