Learning without Memorizing Considered Infeasible: Rethinking Memorization in LLMs

Bishwamittra Ghosh; Soumi Das; Qinyuan Wu; Mohammad Aflah Khan; Krishna P. Gummadi; Evimaria Terzi; Deepak Garg

Learning without Memorizing Considered Infeasible: Rethinking Memorization in LLMs

Bishwamittra Ghosh, Soumi Das, Qinyuan Wu, Mohammad Aflah Khan, Krishna P. Gummadi, Evimaria Terzi, Deepak Garg

09 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Memorization Measures in LLMs, Memorizaton and Learning, Privacy risks of Memorization

TL;DR: A study to critically assess different memorization measures and their implications in large language models, particularly on determining whether learning without memorization is feasible and whether the privacy risks of memorization are exaggerated.

Abstract: Memorizing when learning is considered undesirable for two distinct reasons: first, from a privacy perspective, memorization raises concerns about potential leakage of sensitive information in training data. Second, from a learning perspective, memorization raises concerns of sub-optimal learning and over-fitting. In this paper, we rethink measures of memorization in large language models (LLMs). We find that existing *measures of memorization*, namely recollection-based and counterfactual measures, are designed to capture privacy concerns, but they ignore optimal learning concerns. We propose a new memorization measure, called *contextual memorization* that captures LLMs tendency to locally over-fit some strings in the training data before others, over multiple epochs of training. Applying these measures when training LLMs leads us to two striking conclusions. First, a systematic analysis of all the measures shows that our new measure avoids a major pitfall of prior measures, by distinguishing context-based recollection from memorization-based recollection of a training string. Using our measure, we revisit prior reported instances of training data memorization by real- world LLMs and find that many instances can be explained away by contextual learning-based recollection, i.e., the prior memorization reports are likely exaggerated. Second, we find that when LLMs learn a language optimally, they inevitably end up *memorizing* some portions of the training data. We support our conclusion with extensive experiments training 18 LLMs from 6 model families to learn a variety of formal languages.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 3287

Loading