Hidden in Memory: Sleeper Memory Poisoning in LLM Agents

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI Security, LLM Agents, Memory Poisoning
TL;DR: Sleeper memory poisoning implants a fabricated memory via adversarial external content, allowing it to persist and influence future conversations after the original attack context is gone.
Abstract: Large language models are increasingly augmented with persistent memory, allowing assistants to store user-specific information across sessions for personalization and continuity. This statefulness introduces a new security risk: the adversarial content can corrupt what an assistant remembers and thereby influence future interactions. We propose and study \emph{sleeper memory poisoning}, a delayed attack in which an adversary manipulates external context, such as a document, webpage, or repository, to cause the assistant to store a fabricated memory about the user. Unlike conventional prompt injection, the attack can remain dormant and re-emerge across multiple later conversations. We evaluate the full attack pipeline: whether poisoned memories are written, later retrieved, and ultimately used to steer the following conversations. Across stateful LLM assistants, poisoned memories were added up to 99.8% on GPT-5.5 and 95% on Kimi-K2.6. Crucially, among successful injections, poisoned memories cause attacker-intended agentic actions in 60--89% of evaluations across models. These results show that persistent memory can act as a long-term attack surface across multiple future conversations.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 287
Loading