A Generalization Result for Convergence in Learning-to-Optimize

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We present a generalization theorem that allows for establishing the convergence of learned optimization algorithms to critical points with high probability.
Abstract: Learning-to-optimize leverages machine learning to accelerate optimization algorithms. While empirical results show tremendous improvements compared to classical optimization algorithms, theoretical guarantees are mostly lacking, such that the outcome cannot be reliably assured. Especially, convergence is hardly studied in learning-to-optimize, because conventional convergence guarantees in optimization are based on geometric arguments, which cannot be applied easily to learned algorithms. Thus, we develop a probabilistic framework that resembles classical optimization and allows for transferring geometric arguments into learning-to-optimize. Based on our new proof-strategy, our main theorem is a generalization result for parametric classes of potentially non-smooth, non-convex loss functions and establishes the convergence of learned optimization algorithms to critical points with high probability. This effectively generalizes the results of a worst-case analysis into a probabilistic framework, and frees the design of the learned algorithm from using safeguards.
Lay Summary: Traditional optimization algorithms can be slow, so we aim to accelerate them by using machine learning. However, learned algorithms can make mistakes, while traditional ones provably work in the intended way. To “work in the intended way” means that if we run the algorithm for a long enough time, its final output gets closer and closer to a point that can serve as a solution to the optimization problem - technically called “convergence”. This type of behavior is hard to prove for learned algorithms, because they are trained on a data set and might over-adapt to it. Additionally, known results from the optimization literature do not apply to this setting. Thus, we tackle this problem from a new, statistical perspective and show that it is highly likely that, if the algorithm does work in the intended way during training, it will also do so on unseen problems - technically called “generalization”. Even more so, while our paper exemplifies the underlying idea for the problem of convergence in learning-to-optimize, our new proof-strategy applies more generally to algorithms that obey a certain structural property and allows for showing that their overall behavior generalizes from training to test data.
Primary Area: Theory->Optimization
Keywords: learning-to-optimize, non-smooth non-convex optimization, PAC-Bayesian guarantees, convergence
Submission Number: 6618
Loading