Supplementary Material: pdf
Track: Extended Abstract Track
Keywords: Representational similarity metrics, memorization vs generalization, model stitching
TL;DR: Comparing representational and functional similarlity between the layers of noisy model and noise-free model using CKA and model stitching to understand memorization.
Abstract: It is well-known that deep neural networks can memorize even randomly labeled training data, raising questions on our understanding of their generalization behavior. However, despite several priors efforts, the mechanism and dynamics of how and where in the network memorization takes place is still not well understood, with contradictory findings in the literature. In this work, we use representation similarity analysis methods, in particular Centered Kernel Alignment (CKA) and model stitching, as a simple but effective way to guide the design of experiments that help shed light on the learning dynamics of different layers in a deep neural networks that are trained using random (i.e., noisy) labels. Our results corroborate some of the previous findings in the literature, provide new insights into the representations learned by the layers of the network when trained with various degrees of label noise, and also provide guidance into how techniques such as model stitching could be best leveraged to understand the functional similarity of a model that has memorized with another model that has good generalization.
Submission Number: 82
Loading