The Curse of Unrolling: Rate of Differentiating Through Optimization

Damien Scieur; Gauthier Gidel; Quentin Bertrand; Fabian Pedregosa

The Curse of Unrolling: Rate of Differentiating Through Optimization

Damien Scieur, Gauthier Gidel, Quentin Bertrand, Fabian Pedregosa

Published: 31 Oct 2022, Last Modified: 14 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: implicit differentiation, unrolling, optimization, bi-level, meta-learning, sobolev

Abstract: Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. This work provides a non-asymptotic convergence-rate analysis of this approach on quadratic objectives for gradient descent and the Chebyshev method. We show that to ensure convergence of the Jacobian, we can either 1) choose a large learning rate leading to a fast asymptotic convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence. We refer to this phenomenon as the curse of unrolling. Finally, we discuss open problems relative to this approach, such as deriving a practical update rule for the optimal unrolling strategy and making novel connections with the field of Sobolev orthogonal polynomials.

TL;DR: We analyse the convergence rate of classical optimization algorithm for unrolled differentiation of quadratic functions. Then, we propose an accelerated method based on Sobolev polynomials.

Supplementary Material: pdf

9 Replies

Loading