Explicit loss asymptotics in the gradient descent training of neural networksDownload PDF

21 May 2021, 20:52 (edited 23 Oct 2021)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: Deep learning theory, neural networks, gradient descent, Neural Tangent Kernel, spectral theory
  • TL;DR: We derive explicit loss asymptotics for gradient descent training of neural networks
  • Abstract: Current theoretical results on optimization trajectories of neural networks trained by gradient descent typically have the form of rigorous but potentially loose bounds on the loss values. In the present work we take a different approach and show that the learning trajectory of a wide network in a lazy training regime can be characterized by an explicit asymptotic at large training times. Specifically, the leading term in the asymptotic expansion of the loss behaves as a power law $L(t) \sim C t^{-\xi}$ with exponent $\xi$ expressed only through the data dimension, the smoothness of the activation function, and the class of function being approximated. Our results are based on spectral analysis of the integral operator representing the linearized evolution of a large network trained on the expected loss. Importantly, the techniques we employ do not require a specific form of the data distribution, for example Gaussian, thus making our findings sufficiently universal.
  • Supplementary Material: pdf
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
15 Replies

Loading