SO-Lazy-BiO: Accelerating Bilevel Optimization with Reduced Second-Order Information Computation

ICLR 2026 Conference Submission15198 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bilevel optimization, stochastic optimization, lazy second-order information evaluation
Abstract: Bilevel optimization has attracted significant attention recently due to its applicability in various large-scale machine learning tasks (e.g., the large language model (LLM) pretraining-finetuning pipeline). In the literature, one popular approach for solving bilevel optimization problems is to use hypergradient-based methods. However, computing the hypergradients requires evaluating second-order information (Hessians/Jacobians) of the lower-level objective function, which is computationally expensive. To address this challenge, we propose SO-Lazy-BiO (**S**econd-**O**rder **Lazy** **Bi**level **O**ptimization), an algorithmic framework that significantly accelerates the state-of-the-art (SOTA) bilevel optimization methods by allowing *infrequent* evaluation of second-order information. We theoretically establish the performance of SO-Lazy-BiO and show that, despite the additional errors incurred by the infrequent evaluations of second-order information, SO-Lazy-BiO *surprisingly* matches the computation complexity of existing non-lazy bilevel algorithms, while requiring *fewer* second-order information evaluations. This leads to substantial savings in both computational cost and wall-clock running time. We further conduct extensive experiments to demonstrate that SO-Lazy-BiO enjoys significant gains in numerical performance compared to SOTA, especially for large-scale tasks. To our knowledge, this is the first work to employ infrequent second‑order computations while still guaranteeing the convergence of stochastic bilevel algorithms.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 15198
Loading