Automatic Differentiation of Optimization Algorithms with Time-Varying Updates

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Numerous optimization algorithms have a time-varying update rule thanks to, for instance, a changing step size, momentum parameter or, Hessian approximation. Often, such algorithms are used as solvers for the lower-level problem in bilevel optimization, and are unrolled when computing the gradient of the upper-level objective. In this paper, we apply unrolled or automatic differentiation to a time-varying iterative process and provide convergence (rate) guarantees for the resulting derivative iterates. We then adapt these convergence results and apply them to proximal gradient descent with variable step size and FISTA when solving partly-smooth problems. We test the convergence (rates) of these algorithms numerically through several experiments. Our theoretical and numerical results show that the convergence rate of the algorithm is reflected in its derivative iterates.
Lay Summary: Optimization problems involve finding solutions that minimize costs or losses, or maximize profits. Optimization algorithms help computers find such solutions by improving their guesses step by step. Many of these algorithms become more effective by changing how they update their guesses over time, for example, by adjusting the length of a step they take. This flexibility allows them to improve the guess more quickly, that is, in fewer steps. In many real-world problems, the objective we are optimizing depends on external parameters. For instance, a company might want to minimize production costs, but those costs depend on raw material prices or market demand. As these parameters vary, both the optimal solution and the algorithm’s intermediate guesses may also vary. Understanding how these variations happen is essential, and this is captured mathematically using derivatives. In this paper, we study how the variation in an optimization algorithm’s guess relates to the variation in the true solution. In fact, we show that, in certain cases, the derivative of the algorithm’s guess itself serves as a good approximation of the derivative of the solution. Our results have applications in areas such as Meta Learning and Hyperparameter Optimization.
Primary Area: Optimization
Keywords: Bilevel Optimization, Algorithm Unrolling, Automatic Differentiation, Machine Learning
Submission Number: 7236
Loading