Learning with Random Learning Rates.

Léonard Blier; Pierre Wolinski; Yann Ollivier

Learning with Random Learning Rates.

Léonard Blier, Pierre Wolinski, Yann Ollivier

27 Sept 2018 (modified: 22 Jun 2025)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Hyperparameter tuning is a bothersome step in the training of deep learning mod- els. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the All Learning Rates At Once (Alrao) optimization method for neural networks: each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude. This comes at practically no computational cost. Perhaps surprisingly, stochastic gra- dient descent (SGD) with Alrao performs close to SGD with an optimally tuned learning rate, for various architectures and problems. Alrao could save time when testing deep learning models: a range of models could be quickly assessed with Alrao, and the most promising models could then be trained more extensively. This text comes with a PyTorch implementation of the method, which can be plugged on an existing PyTorch model.

Keywords: step size, stochastic gradient descent, hyperparameter tuning

TL;DR: We test stochastic gradient descent with random per-feature learning rates in neural networks, and find performance comparable to using SGD with the optimal learning rate, alleviating the need for learning rate tuning.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/learning-with-random-learning-rates/code)

9 Replies

Loading