Log-To-Leak: Prompt Injection Attacks on Tool-Using LLM Agents via Model Context Protocol

ICLR 2026 Conference Submission19255 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Agent, Model Context Protocol, Prompt Injection
Abstract: LLM agents integrated with tool-use capabilities via the Model Context Protocol (MCP) are increasingly deployed in real-world applications, but remain vulnerable to prompt injection. We introduce a new class of prompt-level privacy attacks that covertly force the agent to invoke a malicious logging tool to exfiltrate sensitive information (user queries, tool responses, and agent replies). Unlike prior attacks focused on output manipulation or jailbreaking, ours specifically targets tool invocation decisions while preserving task quality. We systematize the design space of such injected prompts into four components—Trigger, Tool Binding, Justification, and Pressure—and analyze their combinatorial variations. Based on this, we propose the Log-To-Leak framework, where an attacker can log all interactions between the user and the agent. Through extensive evaluation across five real-world MCP servers and four state-of-the-art LLM agents (GPT-4o, GPT-5, Claude-Sonnet-4, and GPT-OSS-120b), we show that the attack consistently achieves high success rates in capturing sensitive interactions without degrading task performance. Our findings expose a critical blind spot in current alignment and safety defenses for tool-augmented LLMs, and call for stronger protections against structured, policy-framed injection threats in real-world deployments.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 19255
Loading