MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks

Jun Shu; Yanwen Zhu; Qian Zhao; Deyu Meng; Zongben Xu

MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks

Jun Shu, Yanwen Zhu, Qian Zhao, Deyu Meng, Zongben Xu

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Meta Learning, Hyperparameters Learning, Generalization on Tasks, Optimization, LR Schedules Learning, DNNs Training

Abstract: The learning rate (LR) is one of the most important hyper-parameters in stochastic gradient descent (SGD) for deep neural networks (DNN) training and generalization. However, current hand-designed LR schedules need to manually pre-specify a fixed form, which limits their ability to adapt to non-convex optimization problems due to the significant variation of training dynamics. Meanwhile, it always needs to search a proper LR schedule from scratch for new tasks. To address these issues, we propose to parameterize LR schedules with an explicit mapping formulation, called MLR-SNet. The learnable structure brings more flexibility for MLR-SNet to learn a proper LR schedule to comply with the training dynamics of DNN. Image and text classification benchmark experiments substantiate the capability of our method for achieving proper LR schedules. Moreover, the meta-learned MLR-SNet is tuning-free plug-and-play to generalize to new heterogeneous tasks. We transfer our meta-trained MLR-SNet to tasks like different training epochs, network architectures, datasets, especially large scale ImageNet dataset, and achieve comparable performance with hand-designed LR schedules. Finally, MLR-Net can achieve better robustness when training data is biased with corrupted noise.

One-sentence Summary: We propose a transferable LR schedules, MLR-SNet, which is plug and play for adapting heterogeneous tasks.

Supplementary Material: zip

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=6ZuZRLgXRe

13 Replies

Loading