Keywords: Bilevel optimization, stochastic optimization, lazy second-order information evaluation
Abstract: Bilevel optimization has attracted significant attention recently due to its applicability in various large-scale machine learning tasks (e.g., the large language model (LLM) pretraining-finetuning pipeline).
In the literature, one popular approach for solving bilevel optimization problems is to use hypergradient-based methods.
However, computing the hypergradients requires evaluating second-order information (Hessians/Jacobians) of the lower-level objective function, which is computationally expensive.
To address this challenge, we propose SO-Lazy-BiO (**S**econd-**O**rder **Lazy** **Bi**level **O**ptimization), an algorithmic framework that significantly accelerates the state-of-the-art (SOTA) bilevel optimization methods by allowing *infrequent* evaluation of second-order information.
We theoretically establish the performance of SO-Lazy-BiO and show that, despite the additional errors incurred by the infrequent evaluations of second-order information, SO-Lazy-BiO *surprisingly* matches the computation complexity of existing non-lazy bilevel algorithms, while requiring *fewer* second-order information evaluations.
This leads to substantial savings in both computational cost and wall-clock running time.
We further conduct extensive experiments to demonstrate that SO-Lazy-BiO enjoys significant gains in numerical performance compared to SOTA, especially for large-scale tasks.
To our knowledge, this is the first work to employ infrequent second‑order computations while still guaranteeing the convergence of stochastic bilevel algorithms.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 15198
Loading