LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence

Rahul Yedida, Snehanshu Saha, Tejas Prashanth

2021 (modified: 13 Jun 2021)Appl. Intell. 2021Readers: Everyone

Abstract: We present a novel theoretical framework for computing large, adaptive learning rates. Our framework makes minimal assumptions on the activations used and exploits the functional properties of the loss function. Specifically, we show that the inverse of the Lipschitz constant of the loss function is an ideal learning rate. We analytically compute formulas for the Lipschitz constant of several loss functions, and through extensive experimentation, demonstrate the strength of our approach using several architectures and datasets. In addition, we detail the computation of learning rates when other optimizers, namely, SGD with momentum, RMSprop, and Adam, are used. Compared to standard choices of learning rates, our approach converges faster, and yields better results.

0 Replies