On the Dynamics under the Averaged Sample Margin Loss and Beyond

Xiong Zhou; Xianming Liu; Hanzhang Wang; Deming Zhai; Junjun Jiang; Xiangyang Ji

On the Dynamics under the Averaged Sample Margin Loss and Beyond

Xiong Zhou, Xianming Liu, Hanzhang Wang, Deming Zhai, Junjun Jiang, Xiangyang Ji

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Implicit bias, neural collapse, gradient descent

TL;DR: We investigate the dynamics of the averaged sample margin loss and provide some insights for improvements.

Abstract: Recent works have studied implicit biases in deep learning, especially the behavior of last-layer features and classifier weights. However, they usually need to simplify the dynamics under gradient descent due to the intractability of loss functions and neural architectures. In this paper, we introduce a concise loss function as a surrogate, namely the Averaged Sample Margin (ASM) loss, which offers more mathematical opportunities to analyze the closed-form dynamics while requiring few simplifications or assumptions, and allows for more practical considerations. Based on the layer-peeled model that views last-layer features as free optimization variables, we build a complete analysis for the unconstrained, regularized, and spherical constrained cases. We show that these dynamics mainly \textit{converge exponentially fast} to a solution depending on the initialization of features and classifier weights, which can help explain why the training of deep neural networks usually takes only a few hundred epochs. Our theoretical results can also aid in providing insights for improvements in practical training with the ASM loss or other losses, such as explicit feature regularization and rescaled learning rate for spherical cases. Finally, we empirically demonstrate these theoretical results and insights with extensive experiments.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Supplementary Material: zip

17 Replies

Loading