Semi-Local Search for LR Schedules

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: stochastic optimization, learning rate schedule, learning to learn
Abstract: The learning rate schedule is a critical parameter of the optimization pipeline in modern machine learning. Unfortunately, searching for the optimal schedule is very difficult because the simple "local search" method of using the learning rate that does best on the very next iteration performs poorly in practice: industry standard schedules such as cosine decay or WSD trade-off worse early performance for better final performance. We investigate the extent to which a ``semi-local'' search that only looks a few iterations ahead can rectify this problem in order to design an automated procedure to search for good learning rate schedules. Our experiments rigorously establish that simple greedy search methods fail to find optimal schedules, but that a limited amount of non-locality in the search \emph{can} design better schedules.
Primary Area: optimization
Submission Number: 10323
Loading