Implicit Bias of MSE Gradient Optimization in Underparameterized Neural NetworksDownload PDF


Sep 29, 2021 (edited Oct 05, 2021)ICLR 2022 Conference Blind SubmissionReaders: Everyone
  • Keywords: underparameterized regime, spectral bias, neural tangent kernel, implicit bias, implicit regularization, gradient flow
  • Abstract: We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow. We show that in the underparameterized regime the network learns eigenfunctions of an integral operator $T_K$ determined by the Neural Tangent Kernel at rates corresponding to their eigenvalues. For example, for uniformly distributed data on the sphere $S^{d - 1}$ and rotation invariant weight distributions, the eigenfunctions of $T_K$ are the spherical harmonics. Our results can be understood as describing a spectral bias in the underparameterized regime. The proofs use the concept of ``Damped Deviations'' where deviations of the NTK matter less for eigendirections with large eigenvalues. Aside from the underparameterized regime, the damped deviations point-of-view allows us to extend certain results in the literature in the overparameterized setting.
  • One-sentence Summary: Underparameterized networks optimizing MSE learn eigenfunctions of an NTK integral operator at rates corresponding to their eigenvalues.
0 Replies