Efficient Curvature-Aware Hypergradient Approximation for Bilevel Optimization

Youran Dong; Junfeng Yang; Wei Yao; Jin Zhang

Efficient Curvature-Aware Hypergradient Approximation for Bilevel Optimization

Youran Dong, Junfeng Yang, Wei Yao, Jin Zhang

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose a new algorithmic framework for bilevel optimization that incorporates curvature information into hypergradient approximations, achieving improved computational complexity and demonstrating significant practical performance.

Abstract: Bilevel optimization is a powerful tool for many machine learning problems, such as hyperparameter optimization and meta-learning. Estimating hypergradients (also known as implicit gradients) is crucial for developing gradient-based methods for bilevel optimization. In this work, we propose a computationally efficient technique for incorporating curvature information into the approximation of hypergradients and present a novel algorithmic framework based on the resulting enhanced hypergradient computation. We provide convergence rate guarantees for the proposed framework in both deterministic and stochastic scenarios, particularly showing improved computational complexity over popular gradient-based methods in the deterministic setting. This improvement in complexity arises from a careful exploitation of the hypergradient structure and the inexact Newton method. In addition to the theoretical speedup, numerical experiments demonstrate the significant practical performance benefits of incorporating curvature information.

Lay Summary: Many machine learning tasks involve tuning settings while training the model—like learning how to teach while also teaching. This setup, known as bilevel optimization, is common in areas like choosing hyperparameters or learning how to learn. To make this process work well, it’s important to estimate how changing one part (like the settings) affects another (like the training). In this work, we developed a faster way to do that by capturing not just the direction of change, but also how sharply things change—like adjusting not just the steering wheel but also feeling the slope of the road. A traditional mathematical tool called the Newton method helped guide this improvement. We created a new method that’s both efficient and reliable, and we proved that it performs better than many existing approaches—especially when things are predictable. These improvements are not just on paper; experiments show it really works in practice. Our findings make bilevel optimization faster and more practical, helping machine learning systems learn and adapt more smoothly.

Primary Area: Optimization

Keywords: bilevel optimization, hypergradient, inexact Newton method, computational complexity

Submission Number: 4739

Loading