VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation

Harshil Patel; Kunal Pai

VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation

Harshil Patel, Kunal Pai

Published: 23 May 2026, Last Modified: 06 Jun 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0

Keywords: MCP security, indirect prompt injection, error-path injection, tool-calling agents, mutation testing, adversarial payloads, implicit authority

TL;DR: Tool error messages carry implicit authority, tripling prompt injection success rates and achieving 100% compliance across four frontier models, though production CLI frameworks with guardrails remain resilient.

Abstract: As the Model Context Protocol (MCP) standardizes tool-calling for autonomous agents, it introduces a critical, unexamined attack surface: the error-handling loop. We hypothesize that tool error messages possess implicit authority, triggering corrective reasoning modes that bypass standard safety heuristics. We introduce VATS (Vulnerability Analysis of Tool Streams), a mutation-driven framework that systematically evolves adversarial payloads across seven structural and linguistic dimensions. Our evaluation across four frontier models, Gemini 3.1 Pro, GPT-5.5, GLM-5.1, and Qwen3-Coder, demonstrates that error-path injection triples the success rate of standard indirect prompt injection (IPI), achieving up to 100\% compliance. We isolate structural positioning (sandwiching instructions within error context) as the most effective universal exploit vector. While we find that production framework guardrails can mitigate these vulnerabilities, the inherent susceptibility of the model layer poses a systemic risk to bespoke agentic workflows.

Track: Short Paper (4 pages)

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 252

Loading