Optimization as a Dynamical System: Generative Schedules from Latent ODEs

TMLR Paper8881 Authors

11 May 2026 (modified: 22 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We present a new meta-learning method to determine the optimal learning rate schedule for gradient descent. It leverages training runs from a hyperparameter search to learn a latent representation of the training process, which is modeled as a dynamical systems. Given current training metrics, it predicts the future learning rate schedule with the best long-term validation performance. Our scheduler generalizes beyond previously observed training dynamics and creates specialized schedules that deviate noticeably from even the best-performing parametric functions. It outperforms all baselines we compare to on results for image classification with CNN and ResNet models as well as for next-token prediction with a transformer model. The trained models are located in flatter regions of the loss landscape and thus provide better generalization than those trained with other schedules. Our method is computationally efficient, optimizer-agnostic, and can easily be layered on top of ML experiment-tracking platforms to streamline training of neural networks from scratch.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Bruno_Loureiro1
Submission Number: 8881
Loading