Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks

Benjamin Bowman; Guido Montufar

Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks

Benjamin Bowman, Guido Montufar

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 PosterReaders: Everyone

Keywords: underparameterized regime, spectral bias, neural tangent kernel, implicit bias, implicit regularization, gradient flow

Abstract: We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow. We show that in the underparameterized regime the network learns eigenfunctions of an integral operator $T_K$ determined by the Neural Tangent Kernel at rates corresponding to their eigenvalues. For example, for uniformly distributed data on the sphere $S^{d - 1}$ and rotation invariant weight distributions, the eigenfunctions of $T_K$ are the spherical harmonics. Our results can be understood as describing a spectral bias in the underparameterized regime. The proofs use the concept of ``Damped Deviations'' where deviations of the NTK matter less for eigendirections with large eigenvalues. Aside from the underparameterized regime, the damped deviations point-of-view allows us to extend certain results in the literature in the overparameterized setting.

One-sentence Summary: Underparameterized networks optimizing MSE learn eigenfunctions of an NTK integral operator at rates corresponding to their eigenvalues.

24 Replies

Loading