Abstract: Empirical works show that for ReLU neural networks (NNs) with small initialization, input weights of hidden neurons (the input weight of a hidden neuron
consists of the weight from its input layer to the hidden neuron and its bias term)
condense onto isolated orientations. The condensation dynamics implies that the
training implicitly regularizes a NN towards one with much smaller effective size.
In this work, we illustrate the formation of the condensation in multi-layer fully
connected NNs and show that the maximal number of condensed orientations in
the initial training stage is twice the multiplicity of the activation function, where
“multiplicity” indicates the multiple roots of activation function at origin. Our
theoretical analysis confirms experiments for two cases, one is for the activation
function of multiplicity one with arbitrary dimension input, which contains many
common activation functions, and the other is for the layer with one-dimensional
input and arbitrary multiplicity. This work makes a step towards understanding
how small initialization leads NNs to condensation at the initial training stage.
Loading