IntentGuard: Securing MCP-Enabled LLM Agents via Post-Decision Semantic Plan Verification

Published: 19 Dec 2025, Last Modified: 05 Jan 2026AAMAS 2026 ExtendedAbstractEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Agents, Model Context Protocol (MCP), Tool Metadata Poisoning, Semantic Verification
Abstract: The emergence of the Model Context Protocol (MCP) has significantly enhanced the capabilities of Large Language Model (LLM) agents by enabling dynamic tool discovery and invocation. However, MCP introduces a critical new security threat: Tool Metadata Poisoning. By manipulating the metadata descriptions of external tools, attackers can mislead LLMs into generating tool invocation plans that are syntactically valid but semantically incorrect, causing harmful results while evading existing defense mechanisms. To address this threat, we propose the Intention-Plan Consistency Paradigm, which shifts agent protection to post-decision semantic verification, independent of the agent’s potentially compromised reasoning. Based on this paradigm, we introduce VISTA, which combines information isolation that create a minimal reliable context stripping away unrelated tool metadata, and hierarchical semantic assessment, which validates tool selection and parameter provenance to counter different threat vectors introduced by tool metadata poisoning. We construct MCPIntentEval, a new benchmark to evaluate intent alignment verification in MCP-enabled LLM agents, which includes tool invocations from 7 agent models, 45 real MCP servers, and 353 tools. Extensive evaluations on MCPIntentEval show that VISTA outperforms state-of-the-art baselines, achieving a 13.13\% increase in accuracy, a 9.48\% improvement in F1-score, and a 90\% reduction in false positive rates. Additional experiments across five Judge Model scales and various inference modes further highlight VISTA's robustness in detecting inconsistent outputs. Our code is publicly available at https://anonymous.4open.science/r/MCP-IntentVal-D173/.
Area: Generative and Agentic AI (GAAI)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 128
Loading