Keywords: Implicit bias, neural collapse, gradient descent
TL;DR: We investigate the dynamics of the averaged sample margin loss and provide some insights for improvements.
Abstract: Recent works have studied implicit biases in deep learning, especially the behavior of last-layer features and classifier weights. However, they usually need to simplify the dynamics under gradient descent due to the intractability of loss functions and neural architectures. In this paper, we introduce a concise loss function as a surrogate, namely the Averaged Sample Margin (ASM) loss, which offers more mathematical opportunities to analyze the closed-form dynamics while requiring few simplifications or assumptions, and allows for more practical considerations. Based on the layer-peeled model that views last-layer features as free optimization variables, we build a complete analysis for the unconstrained, regularized, and spherical constrained cases. We show that these dynamics mainly \textit{converge exponentially fast} to a solution depending on the initialization of features and classifier weights, which can help explain why the training of deep neural networks usually takes only a few hundred epochs. Our theoretical results can also aid in providing insights for improvements in practical training with the ASM loss or other losses, such as explicit feature regularization and rescaled learning rate for spherical cases. Finally, we empirically demonstrate these theoretical results and insights with extensive experiments.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
Supplementary Material: zip
17 Replies
Loading