Implicit Sparse Regularization: The Impact of Depth and Early StoppingDownload PDF

Published: 09 Nov 2021, Last Modified: 22 Oct 2023NeurIPS 2021 PosterReaders: Everyone
Keywords: gradient descent, implicit regularization, early stopping, initialization, depth, sparse recovery
Abstract: In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-$N$ networks, under more realistic settings of noise and correlated designs. We show that early stopping is crucial for gradient descent to converge to a sparse model, a phenomenon that we call \emph{implicit sparse regularization}. This result is in sharp contrast to known results for noiseless and uncorrelated-design cases. We characterize the impact of depth and early stopping and show that for a general depth parameter $N$, gradient descent with early stopping achieves minimax optimal sparse recovery with sufficiently small initialization $w_0$ and step size $\eta$. In particular, we show that increasing depth enlarges the scale of working initialization and the early-stopping window so that this implicit sparse regularization effect is more likely to take place.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
TL;DR: We study the implicit regularization and early stopping of gradient descent for sparse regression with depth $N$.
Supplementary Material: pdf
Code: https://github.com/jiangyuan2li/Implicit-Sparse-Regularization
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2108.05574/code)
12 Replies

Loading