Gradient Explosion and Representation Shrinkage in Infinite NetworksDownload PDF


Sep 29, 2021 (edited Oct 05, 2021)ICLR 2022 Conference Blind SubmissionReaders: Everyone
  • Keywords: deep learning theory, mean-field approximation
  • Abstract: We study deep fully-connected neural networks using the mean field formalism, and carry out a non-perturbative analysis of signal propagation. As a result, we demonstrate that increasing the depth leads to gradient explosion or to another undesirable phenomenon we call representation shrinkage. The appearance of at least one of these problems is not restricted to a specific initialization scheme or a choice of activation function, but rather is an inherent property of the fully- connected architecture itself. Additionally, we show that many popular normal- ization techniques fail to mitigate these problems. Our method can also be applied to residual networks to guide the choice of initialization variances.
  • One-sentence Summary: Non-perturbative analysis of signal propagation reveals new problems with deep networks.
0 Replies