Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates

Leslie N. Smith; Nicholay Topin

Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates

Leslie N. Smith, Nicholay Topin

15 Feb 2018 (modified: 22 Jun 2025)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: In this paper, we show a phenomenon, which we named ``super-convergence'', where residual networks can be trained using an order of magnitude fewer iterations than is used with standard training methods. The existence of super-convergence is relevant to understanding why deep networks generalize well. One of the key elements of super-convergence is training with cyclical learning rates and a large maximum learning rate. Furthermore, we present evidence that training with large learning rates improves performance by regularizing the network. In addition, we show that super-convergence provides a greater boost in performance relative to standard training when the amount of labeled training data is limited. We also derive a simplification of the Hessian Free optimization method to compute an estimate of the optimal learning rate. The architectures to replicate this work will be made available upon publication.

TL;DR: Empirical proof of a new phenomenon requires new theoretical insights and is relevent to the active discussions in the literature on SGD and understanding generalization.

Keywords: Deep Learning, machine learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 12 code implementations](https://www.catalyzex.com/paper/super-convergence-very-fast-training-of/code)

10 Replies

Loading