Super-Convergence: Very Fast Training of Residual Networks Using Large Learning RatesDownload PDF

15 Feb 2018 (modified: 15 Feb 2018)ICLR 2018 Conference Blind SubmissionReaders: Everyone
Keywords: Deep Learning, machine learning
TL;DR: Empirical proof of a new phenomenon requires new theoretical insights and is relevent to the active discussions in the literature on SGD and understanding generalization.
Abstract: In this paper, we show a phenomenon, which we named ``super-convergence'', where residual networks can be trained using an order of magnitude fewer iterations than is used with standard training methods. The existence of super-convergence is relevant to understanding why deep networks generalize well. One of the key elements of super-convergence is training with cyclical learning rates and a large maximum learning rate. Furthermore, we present evidence that training with large learning rates improves performance by regularizing the network. In addition, we show that super-convergence provides a greater boost in performance relative to standard training when the amount of labeled training data is limited. We also derive a simplification of the Hessian Free optimization method to compute an estimate of the optimal learning rate. The architectures to replicate this work will be made available upon publication.
10 Replies

Loading