A2M: Trace-Optimized Agent Hijacking in the MCP Ecosystem

ACL ARR 2026 January Submission5674 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Agents, Model Context Protocol, Adversarial Attacks
Abstract: The evolution of Large Language Models into autonomous agents via the Model Context Protocol (MCP) introduces a critical security vulnerability because agents rely on semantic matching to select tools from unverified third-party MCP servers. This creates a novel semantic supply chain attack surface. We introduce A2M (Attraction-to-Manipulation), a two-stage black-box optimization framework designed to systematically hijack MCP agents. A2M first optimizes tool metadata to maximize selection probability through an Attraction phase and subsequently employs a trace-driven Analyzer-Optimizer to craft adversarial return payloads that steer agent reasoning towards attacker-desired outcomes by optimizing the tool further during a Manipulation phase. Extensive evaluations on LiveMCPBench tasks across various frontier models, including both proprietary and open-weight architectures, demonstrate the severity of this threat. A2M inflates token costs by up to 32.4× with cognitive denial of service, and achieves high success rates on information exfiltration, environment integrity compromise, and reasoning derailment. Furthermore, optimized tools exhibit strong transferability to unseen models and successfully bypass existing perplexity-based defenses and lightweight auditors. Our findings underscore the fragility of the tool selection layer and highlight an urgent need for robust vetting and isolation mechanisms in agentic ecosystems. A2M is open source and anonymously available at https://anonymous.4open.science/r/A2M-63A0.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: AI agents, Adversarial attacks, Security and privacy, Red teaming
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 5674
Loading