Abstract: Deep Residual Networks present a premium in performance in comparison to conventional
networks of the same depth and are trainable at extreme depths. It has
recently been shown that Residual Networks behave like ensembles of relatively
shallow networks. We show that these ensemble are dynamic: while initially
the virtual ensemble is mostly at depths lower than half the network’s depth, as
training progresses, it becomes deeper and deeper. The main mechanism that controls
the dynamic ensemble behavior is the scaling introduced, e.g., by the Batch
Normalization technique. We explain this behavior and demonstrate the driving
force behind it. As a main tool in our analysis, we employ generalized spin glass
models, which we also use in order to study the number of critical points in the
optimization of Residual Networks.
TL;DR: Residual nets are dynamic ensembles
Conflicts: none
Keywords: Deep learning, Theory
18 Replies
Loading