Towards better generalization: Weight Decay induces low-rank bias for neural networks

27 Sept 2024 (modified: 22 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Low-rank bias, ReLU Neural Networks, Generalization Error, Implicit regularization, SGD, Weight Decay
TL;DR: Both theoretically and empirically, we show that weight decay leads to low rank bias of neural network and better generalization performance.
Abstract:

We study the implicit bias towards low-rank weight matrices when training neural networks (NN) with Weight Decay (WD). We prove that when a ReLU NN is sufficiently trained with Stochastic Gradient Descent (SGD) and WD, its weight matrix is approximately a rank-two matrix. Empirically, we demonstrate that WD is a necessary condition for inducing this low-rank bias across both regression and classification tasks. Our work differs from previous studies as our theoretical analysis does not rely on common assumptions regarding the training data distribution, optimality of weight matrices, or specific training procedures. Furthermore, by leveraging the low-rank bias, we derive improved generalization error bounds and provide numerical evidence showing that better generalization can be achieved. Thus, our work offers both theoretical and empirical insights into the strong generalization performance of SGD when combined with WD.

Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11816
Loading