Towards a Complete Theory of Neural Networks with Few NeuronsDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: theory of neural networks, non-convex landscapes, critical manifolds, gradient flow dynamics
TL;DR: We analytically study the landscapes of neural networks with a few neurons, shedding light on how the neurons move following gradient flow.
Abstract: Deep learning has seen unprecedented progress thanks to the deployment of models with millions of parameters. On the theoretical side, an immense amount of effort has gone to understanding the dynamics of overparameterized networks. Although now there is a well-developed theory of networks with infinitely many neurons, the classic problem of understanding how a neural network with a few neurons learns remains unsolved. To attack this problem, we analytically study the landscapes of neural networks with few neurons. We prove for the first time that a student network with one neuron has only one critical point --its global minimum-- when learning from a teacher network with arbitrarily many orthogonal neurons. In addition, we prove how a neuron addition mechanism turns a minimum into a line of critical points with transitions from saddles to local minima via non-strict saddles. Finally, we discuss how the insights we get from our novel proof techniques may shed light on the dynamics of neural networks with few neurons.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Theory (eg, control theory, learning theory, algorithmic game theory)
10 Replies

Loading