Keywords: deep learning theory, mean-field approximation
Abstract: We study deep fully-connected neural networks using the mean field formalism,
and carry out a non-perturbative analysis of signal propagation. As a result, we
demonstrate that increasing the depth leads to gradient explosion or to another
undesirable phenomenon we call representation shrinkage. The appearance of at
least one of these problems is not restricted to a specific initialization scheme or
a choice of activation function, but rather is an inherent property of the fully-
connected architecture itself. Additionally, we show that many popular normal-
ization techniques fail to mitigate these problems. Our method can also be applied
to residual networks to guide the choice of initialization variances.
One-sentence Summary: Non-perturbative analysis of signal propagation reveals new problems with deep networks.
18 Replies
Loading