Is Memorization Actually Necessary for Generalization

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Memorization, Generalization
TL;DR: We find that memorization decreases generalization, unlike the findings of the popular work from Feldmen et al
Abstract: \begin{abstract} Deep learning models are known for their ability to memorize training data. While memorization is often linked to risks such as privacy leakage and poor robustness, a highly influential claim by~\citet{feldman2020longtail} argues that memorization is actually \textit{necessary} for generalization. Their conclusion is based on the observation that removing points with high memorization scores reduces test accuracy. Upon closer inspection of their work, we uncover four critical flaws in the underlying methodology: \textbf{(1) sampling bias} in their approximation algorithm inflates memorization scores; \textbf{(2) high false positive rate} in their definition of memorization, leads to misclassification of non-memorized points as memorized; \textbf{(3) unprincipled thresholding} that resulting in an ill-posed problem; and \textbf{(4) data leakage} skews the test accuracy results. To address these limitations, we introduce a modifications for correctly identifying and evaluating memorization, including higher sampling rates, modifying the original memorization definition to reduce the false positive rates, proposing a method to identify a principled score threshold, and employing test datasets especially designed to avoid data leakage. Having accounted for these errors, our results show that, in contradiction to the original work, removing truly memorized points does not cause a drop in accuracy, and in most cases, improves test performance. These findings call into question the necessity of memorization in deep learning and highlight the importance of mitigating its risks. \end{abstract}
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 9920
Loading