The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network ModelsDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: Gradient descent, neural networks, implicit regularization, quenching-activation
Abstract: A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes. It is found that there are two distinctive phases in the GD dynamics in the under-parameterized regime: An early phase in which the GD dynamics follows closely that of the corresponding random feature model, followed by a late phase in which the neurons are divided into two groups: a group of a few (maybe none) “activated” neurons that dominate the dynamics and a group of ``quenched” neurons that support the continued activation and deactivation process. In particular, when the target function can be accurately approximated by a relatively small number of neurons, this quenching-activation process biases GD to picking sparse solutions. This neural network-like behavior is continued into the mildly over-parameterized regime, in which it undergoes a transition to a random feature-like behavior where the inner-layer parameters are effectively frozen during the training process. The quenching process seems to provide a clear mechanism for ``implicit regularization''. This is qualitatively different from the GD dynamics associated with the ``mean-field'' scaling where all neurons participate equally.
One-sentence Summary: The gradient descent dynamics for two-layer neural networks exhibits a quenching-activation behavior.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2006.14450/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=MTT5jTOWdm
5 Replies

Loading