Keywords: Hessian Eigenvalues, Residual Connections, Condition Number, Spectrum
TL;DR: Network depth increases outlier eigenvalues in the Hessian. Residual connections mitigate this.
Abstract: It is well-known that deeper neural networks are harder to train than shallower ones. In this short paper, we use the (full) eigenvalue spectrum of the Hessian to explore how the loss landscape changes as the network gets deeper, and as residual connections are added to the architecture. Computing a series of quantitative measures on the Hessian spectrum, we show that the Hessian eigenvalue distribution in deeper networks has substantially heavier tails (equivalently, more outlier eigenvalues), which makes the network harder to optimize with first-order methods. We show that adding residual connections mitigates this effect substantially, suggesting a mechanism by which residual connections improve training.