The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

Chao Ma; Lei Wu; Weinan E

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

Chao Ma, Lei Wu, Weinan E

28 Sept 2020 (modified: 08 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Gradient descent, neural networks, implicit regularization, quenching-activation

Abstract: A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes. It is found that there are two distinctive phases in the GD dynamics in the under-parameterized regime: An early phase in which the GD dynamics follows closely that of the corresponding random feature model, followed by a late phase in which the neurons are divided into two groups: a group of a few (maybe none) “activated” neurons that dominate the dynamics and a group of ``quenched” neurons that support the continued activation and deactivation process. In particular, when the target function can be accurately approximated by a relatively small number of neurons, this quenching-activation process biases GD to picking sparse solutions. This neural network-like behavior is continued into the mildly over-parameterized regime, in which it undergoes a transition to a random feature-like behavior where the inner-layer parameters are effectively frozen during the training process. The quenching process seems to provide a clear mechanism for ``implicit regularization''. This is qualitatively different from the GD dynamics associated with the ``mean-field'' scaling where all neurons participate equally.

One-sentence Summary: The gradient descent dynamics for two-layer neural networks exhibits a quenching-activation behavior.

Supplementary Material: zip

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/the-quenching-activation-behavior-of-the/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=MTT5jTOWdm

5 Replies

Loading