Deep Network Partition Density Exhibits Double Descent

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Double Descent, Partition Density, Linear Regions, Local Complexity
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We present a novel method to measure the local complexity of Deep Networks, and show that it exhibits a double descent phenomenon
Abstract: The study of Deep Network (DN) training dynamics has largely focused on the dynamics of the loss function, evaluated on or around train and test set samples. In fact, many DN phenomenon were first introduced in literature with respect to the loss or accuracy dynamics during training, e.g., double descent, grokking. No other statistics about the DN has been found to be as informative as the loss function. In this study, we provide a novel statistic that measures the underlying DN’s local complexity, exhibiting two key benefits: (i) it does not require any labels, and (ii) it is informative about the training loss and accuracy dynamics. Our proposed statistic is based on the concentration of partition regions around samples –which encompasses the local expressivity or complexity of a DN– and can be applied on arbitrary architectures, e.g. CNNs, VGGs and Resnets. We show that our statistic exhibits a double descent phenomenon during training, with the partition density first decreasing around training samples, then increasing (ascent), followed by an other descent during which neurons migrate towards the decision boundaries. We see this phenomenon happening for a number of different experimental setups, e.g., training with label noise, delayed generalization, i.e., grokking. Our observations provide a novel lens to study DN training dynamics from a spline theory perspective.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9446
Loading