Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width

Hanxu Zhou; Qixu Zhou; Zhenyuan Jin; Tao Luo; Yaoyu Zhang; Zhi-Qin John Xu

Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width

Hanxu Zhou, Qixu Zhou, Zhenyuan Jin, Tao Luo, Yaoyu Zhang, Zhi-Qin John Xu

Published: 31 Oct 2022, Last Modified: 24 Dec 2022NeurIPS 2022 AcceptReaders: Everyone

Keywords: training dynamics, neural networks, phase diagram, initialization

Abstract: Substantial work indicates that the dynamics of neural networks (NNs) is closely related to their initialization of parameters. Inspired by the phase diagram for two-layer ReLU NNs with infinite width (Luo et al., 2021), we make a step towards drawing a phase diagram for three-layer ReLU NNs with infinite width. First, we derive a normalized gradient flow for three-layer ReLU NNs and obtain two key independent quantities to distinguish different dynamical regimes for common initialization methods. With carefully designed experiments and a large computation cost, for both synthetic datasets and real datasets, we find that the dynamics of each layer also could be divided into a linear regime and a condensed regime, separated by a critical regime. The criteria is the relative change of input weights (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) as the width approaches infinity during the training, which tends to $0$, $+\infty$ and $O(1)$, respectively. In addition, we also demonstrate that different layers can lie in different dynamical regimes in a training process within a deep NN. In the condensed regime, we also observe the condensation of weights in isolated orientations with low complexity. Through experiments under three-layer condition, our phase diagram suggests a complicated dynamical regimes consisting of three possible regimes, together with their mixture, for deep NNs and provides a guidance for studying deep NNs in different initialization regimes, which reveals the possibility of completely different dynamics emerging within a deep NN for its different layers.

TL;DR: We make a step towards drawing a phase diagram for three-layer ReLU NNs with infinite width.

Supplementary Material: zip

12 Replies

Loading