On the Dynamics & Transferability of Latent Generalization during Memorization

TMLR Paper6781 Authors

02 Dec 2025 (modified: 03 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep networks have been known to have extraordinary generalization abilities, via mechanisms that aren't yet well understood. It is also known that upon shuffling labels in the training data to varying degrees, deep networks, trained with standard methods, can still achieve perfect or high accuracy on this corrupted training data. This phenomenon is called memorization, and typically comes at the cost of poorer generalization to true labels. Recent work has demonstrated, surprisingly, that the internal representations of such models retain significantly better latent generalization abilities than is directly apparent from the model. In particular, it has been shown that such latent generalization can be recovered via simple probes (called MASC probes) on the layer-wise representations of the model. However, several basic questions about this phenomenon of latent generalization remain poorly understood: (1) What is the origin and dynamics over training of latent generalization during memorization? Specifically, is it the case that model generalization and latent generalization use largely the same underlying mechanisms? (2) Is the specific nature of the probe critical for our ability to extract latent generalization from the model's layerwise outputs? (3) Does there exist a way to immediately transfer latent generalization to model generalization by suitably modifying model weights directly? On the one hand, this question is conceptually important because it establishes conclusively that the latent generalization manifested by the probe is also within reach of the model, with exactly the information that the model was provided during training, namely the corrupted training data. On the other hand -- and more pragmatically -- it also suggests the possibility of "repairing" a trained model that has memorized, without requiring expensive retraining from scratch. To address (1), we track the training dynamics, empirically, and find that latent generalization abilities largely peak early in training, with model generalization, suggesting a common origin for both. However, while model generalization degrades steeply over training thereafter, latent generalization falls more modestly & plateaus at a higher level over epochs of training. These experiments lend circumstantial evidence to the hypothesis that latent generalization uses largely similar mechanisms as those that underlie the model's generalization in the early phases of training. To investigate (2), we examine the MASC probe and show that it is a quadratic classifier. The question in (2) thus becomes whether the quadratic nature of the MASC probe underlies its remarkable effectiveness in extracting latent generalization. If this were so, a linear probe constructed along these lines would not be as effective. To investigate this, we designed a new linear probe for this setting, and find, surprisingly, that it has superior generalization performance in comparison to the quadratic probe, in most cases. This suggests that the quadratic nature of the probe is not critical in extracting latent generalization. Importantly, the effectiveness of the linear probe enables us to answer (3) in the affirmative. Specifically, using this new linear probe, we devise a way to transfer the latent generalization present in last-layer representations to the model by directly modifying the model weights. This immediately endows such models with improved generalization, i.e. without additional training. Our findings provide a more detailed account of the rich dynamics of latent generalization during memorization, provide clarifying insight on the specific role of the probe in latent generalization, as well as demonstrate the means to leverage this understanding to directly transfer this generalization to the model.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~William_T_Redman1
Submission Number: 6781
Loading