A Functional Characterization of Randomly Initialized Gradient Descent in Deep ReLU Networks

Justin Sahs; Aneel Damaraju; Ryan Pyle; Onur Tavaslioglu; Josue Ortega Caro; Hao Yang Lu; Ankit Patel

A Functional Characterization of Randomly Initialized Gradient Descent in Deep ReLU Networks

Justin Sahs, Aneel Damaraju, Ryan Pyle, Onur Tavaslioglu, Josue Ortega Caro, Hao Yang Lu, Ankit Patel

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Inductive Bias, Generalization, Interpretability, Functional Characterization, Loss Surface, Initialization

TL;DR: A functional approach reveals that flat initialization, preserved by gradient descent, leads to generalization ability.

Abstract: Despite their popularity and successes, deep neural networks are poorly understood theoretically and treated as 'black box' systems. Using a functional view of these networks gives us a useful new lens with which to understand them. This allows us us to theoretically or experimentally probe properties of these networks, including the effect of standard initializations, the value of depth, the underlying loss surface, and the origins of generalization. One key result is that generalization results from smoothness of the functional approximation, combined with a flat initial approximation. This smoothness increases with number of units, explaining why massively overparamaterized networks continue to generalize well.

Original Pdf: pdf

8 Replies

Loading