Interleaving Optimizers for DNN Training

Yile Chen; Zeyi Wen; Jian Chen; Jin Huang

Interleaving Optimizers for DNN Training

Yile Chen, Zeyi Wen, Jian Chen, Jin Huang

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Optimizer, DNN, HPO

Abstract: Optimizers are crucial in deep neural network (DNN) training, affecting model quality and convergence. Researchers have found that different optimizers often suit different problems or different stages of a problem. Hence, some studies have tried to combine different optimizers to better train DNNs. However, existing methods are limited to simple optimizer switch strategies, which leads to unstable model quality and slow convergence. In this paper, we propose a fine-grain optimizer switch method called Iterleaving Optimizer for Model Training (IOMT), which automatically switches to the appropriate optimizer and hyperparameters based on the training stage information, achieving faster convergence and better model quality. IOMT employs surrogate models to estimate the performance of different optimizers during training and is supported by a transferability assessment to predict the training cost. By combining the transferability assessment, performance estimation, and training process information with an acquisition function, IOMT calculates the optimization gain of each optimizer and switches the optimizer with the largest gain for the next training stage. The experimental results on full training and fine-tuning demonstrate that IOMT achieves faster convergence (e.g., 10\% on the *stl10* dataset) and better performance (e.g., 3\% accuracy improvement on the *cifar10* dataset) compared to existing methods.

Supplementary Material: zip

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9286

Loading