InjecMEM: Memory Injection Attack on LLM Agent Memory Systems

InjecMEM: Memory Injection Attack on LLM Agent Memory Systems

ICLR 2026 Conference Submission18062 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Memory Injection Attack, LLM Agent, Agent Memory System, Agent Safety

TL;DR: InjecMEM is a targeted memory injection attack which requires only one interaction with the agent to steer later response of related queries toward a pre-specified output.

Abstract: Memory is becoming a default subsystem in deployed LLM agents to provide long-horizon personalization and cross-session coherence. This naturally prompts a question: will the memory system introduce new vulnerabilities into LLM agents? Thus we propose **InjecMEM**, a targeted memory injection attack which requires only one interaction with the agent (no read/edit access to memory store) to steer later responses of related queries toward a pre-specified output. Guided by the retrieval-then-generate mechanism of memory system, we split the crafted injection into two cooperating parts. The first part is a retriever-agnostic anchor. It ensures topic-conditioned retrieval using a concise, on-topic passage with a few high-recall cues so that segment summaries and keywording route the record into the target topic. The second part is an adversarial command. It is a short sequence optimized to remain effective under uncertain fused contexts, variable placements, and long prompts so that it reliably steers the outputs once retrieved. We learn this sequence with a gradient-based coordinate search that averages likelihood across multiple synthetic prompt templates and insertion positions. Evaluated on a recent layered memory system (MemoryOS) across several domains, InjecMEM achieves fine topic-conditioned retrieval and targeted generation, persists after benign drift, and leaves non-target queries unaffected. We also demonstrate an indirect attack path in which a compromised tool writes the poison that normal queries later retrieve. Our results underscore the need to harden memory subsystems against adversarial records and provide a reproducible framework for studying the security of memory-augmented agents.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 18062

Loading