Differentiable Self-Adaptive Learning Rate

Bozhou Chen; Hongzhi Wang; Chenmin Ba

Differentiable Self-Adaptive Learning Rate

Bozhou Chen, Hongzhi Wang, Chenmin Ba

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: self-adaptive learning rate

Abstract: Adaptive learning rate has been studied for a long time. In the training session of neural networks, learning rate controls update stride and direction in a multi-dimensional space. A large learning rate may cause failure to converge, while a small learning rate will make the convergence too slow. Even though some optimizers make learning rate adaptive to the training, e.g., using first-order and second-order momentum to adapt learning rate, their network's parameters are still unstable during training and converges too slowly in many occasions. To solve this problem, we propose a novel optimizer which makes learning rate differentiable with the goal of minimizing loss function and thereby realize an optimizer with truly self-adaptive learning rate. We conducted extensive experiments on multiple network models compared with various benchmark optimizers. It is shown that our optimizer achieves fast and high qualified convergence in extremely short epochs, which is far more faster than those state-of-art optimizers.

One-sentence Summary: We achieve truly self-adaptive learning rate because it is differentiable with the goal of minizing loss function.

12 Replies

Loading