Probing Layer-wise Memorization and Generalization in Deep Neural Networks via Model Stitching

TMLR Paper6698 Authors

28 Nov 2025 (modified: 30 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: It is well-known that deep neural networks can both memorize randomly labeled training data and generalize to unseen inputs. However, despite several prior efforts, the mechanism and dynamics of how and where memorization takes place in the network are still unclear, with contradictory findings in the literature. To address this, we aim to study the functional similarity between the layers of the memorized model to the model that generalizes. Specifically, we leverage model stitching as a tool to enable layer-wise comparison of a memorized noisy model, trained on a partially noisy-labeled dataset, to that of the generalized clean model, trained on a clean, noise-free dataset. Our simple but effective approach guides the design of experiments that help shed light on the learning dynamics of different layers in deep neural networks and why models with harmful memorization still generalize well. Our results show that early layers are as important as deeper ones for generalization. We find that ``cleaning'' the early layers of the noisy model improves the functional similarity of its deeper layers to that of the corresponding layers in the clean model. Moreover, cleaning the noise in the early layers of the noisy model can drastically reduce memorization and improve generalization. Furthermore, noise fixation up to a certain depth results in generalization similar to that of a noise-free model. However, interestingly, the reverse may not be true. That is, if early layers are noisy but deeper layers are noise-free, then perfect memorization cannot be achieved, emphasizing the dominant role of deeper layers in memorization. Our extensive experiments on four different architectures - customized CNN model, ResNet-18, ResNet-34, and ResNet-50, and three datasets - SVHN, CIFAR-10, and CIFAR-100, with varying levels of noise, consistently corroborate our findings.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Stanislaw_Kamil_Jastrzebski1
Submission Number: 6698
Loading