## Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates

Feb 15, 2018 (modified: Oct 11, 2017) Blind Submission readers: everyone Show Bibtex
• Abstract: In this paper, we show a phenomenon, which we named super-convergence'', where residual networks can be trained using an order of magnitude fewer iterations than is used with standard training methods. The existence of super-convergence is relevant to understanding why deep networks generalize well. One of the key elements of super-convergence is training with cyclical learning rates and a large maximum learning rate. Furthermore, we present evidence that training with large learning rates improves performance by regularizing the network. In addition, we show that super-convergence provides a greater boost in performance relative to standard training when the amount of labeled training data is limited. We also derive a simplification of the Hessian Free optimization method to compute an estimate of the optimal learning rate. The architectures to replicate this work will be made available upon publication.
• TL;DR: Empirical proof of a new phenomenon requires new theoretical insights and is relevent to the active discussions in the literature on SGD and understanding generalization.
• Keywords: Deep Learning, machine learning
0 Replies