Hyper-Regularization: An Adaptive Choice for the Learning Rate in Gradient Descent

Guangzeng Xie; Hao Jin; Dachao Lin; Zhihua Zhang

Hyper-Regularization: An Adaptive Choice for the Learning Rate in Gradient Descent

Guangzeng Xie, Hao Jin, Dachao Lin, Zhihua Zhang

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: We present a novel approach for adaptively selecting the learning rate in gradient descent methods. Specifically, we impose a regularization term on the learning rate via a generalized distance, and cast the joint updating process of the parameter and the learning rate into a maxmin problem. Some existing schemes such as AdaGrad (diagonal version) and WNGrad can be rederived from our approach. Based on our approach, the updating rules for the learning rate do not rely on the smoothness constant of optimization problems and are robust to the initial learning rate. We theoretically analyze our approach in full batch and online learning settings, which achieves comparable performances with other first-order gradient-based algorithms in terms of accuracy as well as convergence rate.

Keywords: Adaptive learning rate, novel framework

Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10)

9 Replies

Loading