A Functional Characterization of Randomly Initialized Gradient Descent in Deep ReLU NetworksDownload PDF

25 Sep 2019 (modified: 24 Dec 2019)ICLR 2020 Conference Blind SubmissionReaders: Everyone
  • Original Pdf: pdf
  • Keywords: Inductive Bias, Generalization, Interpretability, Functional Characterization, Loss Surface, Initialization
  • TL;DR: A functional approach reveals that flat initialization, preserved by gradient descent, leads to generalization ability.
  • Abstract: Despite their popularity and successes, deep neural networks are poorly understood theoretically and treated as 'black box' systems. Using a functional view of these networks gives us a useful new lens with which to understand them. This allows us us to theoretically or experimentally probe properties of these networks, including the effect of standard initializations, the value of depth, the underlying loss surface, and the origins of generalization. One key result is that generalization results from smoothness of the functional approximation, combined with a flat initial approximation. This smoothness increases with number of units, explaining why massively overparamaterized networks continue to generalize well.
8 Replies