Randomness Helps Rigor: A Probabilistic Learning Rate Scheduler Bridging Theory and Deep Learning Practice
Keywords: Learning rate schedulers, Stochastic gradient descent, Convergence analysis, Deep neural networks
TL;DR: A novel probabilistic learning rate scheduler is proposed; its convergence is analyzed on a restricted class of functions. Excellent empirical results are shown for a broader class, namely training deep neural networks.
Abstract: Learning rate schedulers have shown great success in speeding up the convergence of learning algorithms in practice. However, their convergence to a minimum has not been theoretically proven. This difficulty mainly arises from the fact that, while traditional convergence analysis prescribes to monotonically decreasing (or constant) learning rates, schedulers opt for rates that often increase and decrease through the training epochs. We aim to bridge this gap by proposing a probabilistic learning rate scheduler (PLRS) that does not conform to the monotonically decreasing condition, while achieving provable convergence guarantees. To demonstrate the practical effectiveness of our approach, we evaluate it on deep neural networks across both vision and language tasks, showing competitive or superior performance compared to state-of-the-art learning rate schedulers. Specifically, our experiments include (a) image classification on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet-1K using ResNet, WRN, VGG, and DenseNet architectures, and (b) language model fine-tuning on the SQuAD v1.1 dataset with pretrained BERT. Notably, on ImageNet-1K with ResNet-50, our method surpasses the leading knee scheduler by 2.79% in classification accuracy.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 19287
Loading