Memory-Induced Tool-Drift in LLM Agents
Keywords: Safety, Agents, Memory
Abstract: Modern LLM agents combine long-term memory for personalization with tool-
calling interfaces for taking actions in the world—a combination underpinning
contemporary production systems. We study a previously unexamined failure
of this combination: when personality-driven biases stored in memory (cost-
consciousness, impatience, risk tolerance, etc.) silently affect tool calls in contexts
where they are not applicable. We call this memory-induced tool-drift and operationalize it through MEMDRIFT, a benchmark of 105 scenarios spanning five
bias dimensions and seven professional domains, generated through an automated
adversarial pipeline. Across seven frontier models—including those with extended
reasoning—biased memories raise deflection scores (a judge-scored measure of
parameter deviation from unbiased baselines) by up to +3.6 points on a 1–5 scale.
Tool-drift persists when memory management is handled by three production memory architectures. The phenomenon affects real-world tools: scanning 6,062 tools
across 288 verified MCP servers, we flag 608 with susceptible parameters and
confirm tool-drift on a validated subset. Mechanistically, biased memories act as
implicit steering vectors, pushing activations along the same latent directions as
explicit behavioral instructions. They also redistribute attention from task-relevant
context toward memory entries with surface-level keyword overlap to the target
parameter. Standard defenses—prompt-based relevance instructions and memory
filters—reduce drift but do not eliminate it. As agents take increasingly consequential actions on a user’s behalf, memory-induced tool-drift represents a systematic
vulnerability that current safeguards do not address, motivating dedicated defenses
at the intersection of memory management and tool-call generation.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 290
Loading