SYMBOLICDRIFT: Measuring Reasoning Drift on Unverifiable Questions

Published: 04 Jun 2026, Last Modified: 04 Jun 2026ICML MemFM 2026 Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: unintended memorization
TL;DR: Injecting user memory/profile context into LLM prompts distorts the model's reasoning process by 20-80%, even when that context is completely irrelevant to the question being asked, and conventional accuracy metrics fail to detect this.
Abstract: Large Language Models are increasingly being deployed with persistent user memory, where preferences, traits, and prior context are surfaced into the prompt to personalize responses. Since open-domain questions have no ground-truth answer, reliability must be assessed through the stability of reasoning under semantically irrelevant context variation. The authors introduce SYMBOLICDRIFT, a reference-free framework that maps reasoning traces into a value ontology and quantifies trajectory divergence using Dynamic Time Warping and a Sequence Recurrence Index. They first validate SYMBOLICDRIFT as a sensitive and specific instrument, showing it can discriminate content-free perturbations from genuine semantic shifts with high cross-model convergent validity. They then demonstrate that even a single line of user-attribute context, completely irrelevant to the question being asked, produces measurable drift across four frontier LLMs and 13 categories of user attributes. Injected memory consistently elevates drift 20 to 80 percent above each model's noise floor, revealing that user memory, a feature increasingly central to LLM deployment, induces systematic shifts in reasoning that conventional accuracy metrics entirely miss.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 60
Loading