General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization
TL;DR: Through online-to-nonconvex conversion we show that Schedule-Free SGD is also optimal for non-convex non-smooth optimization.
Abstract: This work investigates the effectiveness of schedule-free methods, developed by A. Defazio et al. (NeurIPS 2024), in nonconvex optimization settings, inspired by their remarkable empirical success in training neural networks. Specifically, we show that schedule-free SGD achieves optimal iteration complexity for nonsmooth, non-convex optimization problems. Our proof begins with the development of a general framework for online-to-nonconvex conversion, which converts a given online learning algorithm into an optimization algorithm for nonconvex losses. Our general framework not only recovers existing conversions but also leads to two novel conversion schemes. Notably, one of these new conversions corresponds directly to schedule-free SGD, allowing us to establish its optimality. Additionally, our analysis provides valuable insights into the parameter choices for schedule-free SGD, addressing a theoretical gap that the convex theory cannot explain.
Lay Summary: When training neural networks, we adjust the model’s internal settings — called parameters — as we see new data. If we adjust too cautiously, learning is slow. If we adjust too aggressively, learning can become unstable. To avoid this, researchers often tune a value called the "learning rate" using hand-crafted schedules, which can be time-consuming and problem-specific.
A recent method called “schedule-free” learning avoids this hassle by removing the need for manual scheduling altogether, and yet performs remarkably well in practice.
In this work, we explain why schedule-free learning works, even in challenging cases where the learning landscape is highly irregular. We develop a general mathematical tool to turn any online learning algorithm (which learns from data sequentially) into one that can solve difficult optimization problems like those found in neural network training. This also allows us to recover and improve existing methods — including schedule-free learning — and to explain why certain parameter choices, commonly used in practice, are effective even though previous theory couldn’t justify them.
Primary Area: Theory->Optimization
Keywords: Schedule-free optimizaer, non-convex optimization, online-to-nonconvex conversion
Submission Number: 4886
Loading