Rethinking Memorization Measures in LLMs: Recollection vs. Counterfactual vs. Contextual Memorization

Published: 11 Jun 2025, Last Modified: 13 Jul 2025MemFMEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Memorization, Contextual learning, Memorization Measures, Large Language Models
TL;DR: Investigating the (potentially conflicting) results of different memorization measures in LLMs.
Abstract: Memorization in large language models (LLMs) is often viewed as undesirable for learning. Existing memorization measures largely focus on quantifying privacy risks, rather than capturing the underlying phenomenon of memorization itself. To address this gap, we introduce contextual memorization, which disentangles memorization from contextual learning – LLMs perform both during training. We further show that existing measures of memorization in LLMs, namely recollection-based, counterfactual, and contextual, yield contradictory results when applied to the same training dynamics, such as disagreeing on the order of memorization of strings of varying frequencies.
Submission Number: 20
Loading