Bridging Constraints and Stochasticity: A Fully First-Order Method for Stochastic Bilevel Optimization with Linear Constraints

Bridging Constraints and Stochasticity: A Fully First-Order Method for Stochastic Bilevel Optimization with Linear Constraints

ICLR 2026 Conference Submission21092 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: stochastic bilevel optimization, linear constraints, first-order methods, penalty method, hypergradient approximation, stochastic oracles, bias–variance analysis, Goldstein stationarity, nonsmooth optimization, convergence guarantees, SFO complexity, Lipschitz & strong convexity, KKT/duality, variance reduction (SVRG), meta-learning, hyperparameter optimization, reinforcement learning, constrained lower-level problem, perturbed gradient, penalty Lagrangian

Abstract: This work provides the first finite-time convergence guarantees for linearly constrained stochastic bilevel optimization using only first-order methods—requiring solely gradient information without any Hessian computations or second-order derivatives. We address the unprecedented challenge of simultaneously handling linear constraints, stochastic noise, and finite-time analysis in bilevel optimization, a combination that has remained theoretically intractable until now. While existing approaches either require second-order information, handle only unconstrained stochastic problems, or provide merely asymptotic convergence results, our method achieves finite-time guarantees using gradient-based techniques alone. We develop a novel penalty-based framework that constructs hypergradient approximations via smoothed penalty functions, using approximate primal and dual solutions to overcome the fundamental challenges posed by the interaction between linear constraints and stochastic noise. Our theoretical analysis provides explicit finite-time bounds on the bias and variance of the hypergradient estimator, demonstrating how approximation errors interact with stochastic perturbations. We prove that our first-order algorithm converges to $(\delta, \epsilon)$-Goldstein stationary points using $\Theta(\delta^{-1}\epsilon^{-5})$ stochastic gradient evaluations, establishing the first finite-time complexity result for this challenging problem class and representing a significant theoretical breakthrough in constrained stochastic bilevel optimization.

Primary Area: optimization

Submission Number: 21092

Loading