Implicit bias of gradient descent for mean squared error regression with wide neural networks

Hui Jin; Guido Montufar

Implicit bias of gradient descent for mean squared error regression with wide neural networks

Hui Jin, Guido Montufar

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Implicit bias, overparametrized neural network, cubic spline interpolation, spatially adaptive smoothing spline, effective capacity

Abstract: We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For 1D regression, we show that the solution of training a width-$n$ shallow ReLU network is within $n^{- 1/2}$ of the function which fits the training data and whose difference from initialization has smallest 2-norm of the second derivative weighted by $1/\zeta$. The curvature penalty function $1/\zeta$ is expressed in terms of the probability distribution that is utilized to initialize the network parameters, and we compute it explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. While similar results have been obtained in previous works, our analysis clarifies important details and allows us to obtain significant generalizations. In particular, the result generalizes to multivariate regression and different activation functions. Moreover, we show that the training trajectories are captured by trajectories of spatially adaptive smoothing splines with decreasing regularization strength.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/implicit-bias-of-gradient-descent-for-mean/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=rzjjCt4Y2L

15 Replies

Loading