On the Dynamics & Transferability of Latent Generalization during Memorization

TMLR Paper6781 Authors

02 Dec 2025 (modified: 20 Feb 2026)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep networks have been known to have extraordinary generalization abilities, via mechanisms that aren't yet well understood. It is also known that upon shuffling labels in the training data to varying degrees, deep networks, trained with standard methods, can still achieve perfect or high accuracy on this corrupted training data. This phenomenon is called memorization, and typically comes at the cost of poorer generalization to true labels. Recent work has demonstrated, surprisingly, that the internal representations of such models retain significantly better latent generalization abilities than is directly apparent from the model. In particular, it has been shown that such latent generalization can be recovered via simple probes (called MASC probes) on the layer-wise representations of the model. However, the origin and dynamics over training of this latent generalization during memorization is not well understood. Here, we track the training dynamics, empirically, and find that latent generalization abilities largely peak early in training, with model generalization. Next, we investigate whether the specific nature of the MASC probe is critical for our ability to extract latent generalization from the model's layerwise outputs. To this end, we first examine the mathematical structure of the MASC probe and show that it is a quadratic classifier, i.e. is non-linear. This brings up the possibility that this latent generalization is not linearly decodable, and that the model is fundamentally incapable of generalizing as well as the MASC probe, given corrupted training data. To investigate this, we designed a new linear probe for this setting, and find, surprisingly, that it has superior generalization performance in comparison to the quadratic probe, in most, but not all cases. Given that latent generalization is linearly decodable in most cases, we ask if there exists a way to leverage probes on layerwise representations, to directly edit model weights to immediately manifest the latent generalization to model generalization. To this end, we devise a way to transfer the latent generalization present in last-layer representations to the model using the new linear probe. This immediately endows such models with improved generalization in most cases, i.e. without additional training. We also explore training dynamics, when the aforementioned weight editing is done midway during training. Our findings provide a more detailed account of the rich dynamics of latent generalization during memorization, provide clarifying explication on the specific role of the probe in latent generalization, as well as demonstrate the means to leverage this understanding to directly transfer this generalization to the model.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~William_T_Redman1
Submission Number: 6781
Loading