Adaptive Hierarchical Hyper-gradient DescentDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: learning rate adaptation, hyper-gradient descent, meta learning, optimisation, hierarchical system
Abstract: Adaptive learning rates can lead to faster convergence and better final performance for deep learning models. There are several widely known human-designed adap- tive optimizers such as Adam and RMSProp, gradient based adaptive methods such as hyper-descent and L4, and meta learning approaches including learning to learn. However, the issue of balancing adaptiveness and over-parameterization is still a topic to be addressed. In this study, we investigate different levels of learning rate adaptation based on the framework of hyper-gradient descent, and further propose a method that adaptively learns the model parameters for combin- ing different levels of adaptations. Meanwhile, we show the relationship between adding regularization on over-parameterized learning rates and building combi- nations of different levels of adaptive learning rates. The experiments on several network architectures including feed-forward networks, LeNet-5 and ResNet-18/34 show that the proposed multi-level adaptive approach can outperform baseline adaptive methods in a variety circumstances with statistical significance.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We introduce hierarchical learning rate adaptation with hyper-gradient descent for deep neural networks.
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=GFquR4vr43
15 Replies

Loading