Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
The loss surface of residual networks: Ensembles and the role of batch normalization
Etai Littwin, Lior Wolf
Nov 04, 2016 (modified: Dec 18, 2016)ICLR 2017 conference submissionreaders: everyone
Abstract:Deep Residual Networks present a premium in performance in comparison to conventional
networks of the same depth and are trainable at extreme depths. It has
recently been shown that Residual Networks behave like ensembles of relatively
shallow networks. We show that these ensemble are dynamic: while initially
the virtual ensemble is mostly at depths lower than half the network’s depth, as
training progresses, it becomes deeper and deeper. The main mechanism that controls
the dynamic ensemble behavior is the scaling introduced, e.g., by the Batch
Normalization technique. We explain this behavior and demonstrate the driving
force behind it. As a main tool in our analysis, we employ generalized spin glass
models, which we also use in order to study the number of critical points in the
optimization of Residual Networks.
TL;DR:Residual nets are dynamic ensembles
Keywords:Deep learning, Theory
Enter your feedback below and we'll get back to you as soon as possible.