Abstract: Very deep convolutional networks with hundreds of layers
have led to significant reductions in error on competitive benchmarks.
Although the unmatched expressiveness of the many layers can be highly
desirable at test time, training very deep networks comes with its own
set of challenges. The gradients can vanish, the forward flow often diminishes,
and the training time can be painfully slow. To address these
problems, we propose stochastic depth, a training procedure that enables
the seemingly contradictory setup to train short networks and use deep
networks at test time. We start with very deep networks but during training,
for each mini-batch, randomly drop a subset of layers and bypass
them with the identity function. This simple approach complements the
recent success of residual networks. It reduces training time substantially
and improves the test error significantly on almost all data sets that we
used for evaluation. With stochastic depth we can increase the depth
of residual networks even beyond 1200 layers and still yield meaningful
improvements in test error (4.91% on CIFAR-10).
Recommender: Ian Goodfellow
6 Replies
Loading