Learning Provably Improves the Convergence of Gradient Descent

Qingyu Song; Wei Lin; Hong Xu

Learning Provably Improves the Convergence of Gradient Descent

Qingyu Song, Wei Lin, Hong Xu

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Learn to Optimize, Gradient Descent, Neural Tangent Kernel

TL;DR: This paper proves the convergence of learn to optimize.

Abstract: Learn to Optimize (L2O) trains deep neural network-based solvers for optimization, achieving success in accelerating convex problems and improving non-convex solutions. However, L2O lacks rigorous theoretical backing for its own training convergence, as existing analyses often use unrealistic assumptions-a gap this work highlights empirically. We bridge this gap by proving the training convergence of L2O models that learn Gradient Descent (GD) hyperparameters for quadratic programming, leveraging the Neural Tangent Kernel (NTK) theory. We propose a deterministic initialization strategy to support our theoretical results and promote stable training over extended optimization horizons by mitigating gradient explosion. Our L2O framework demonstrates over 50% better optimality than GD and superior robustness over state-of-the-art L2O methods on synthetic datasets. The code of our method can be found from https://github.com/NetX-lab/MathL2OProof-Official.

Primary Area: Optimization (e.g., convex and non-convex, stochastic, robust)

Submission Number: 913

Loading