Bregman Geometry for Stochastic Online Bilevel Optimization

Bregman Geometry for Stochastic Online Bilevel Optimization

ICLR 2026 Conference Submission20818 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Online Bilevel Optimization, Bregman Geometry, Time-Smoothed Hypergradients (Variance Reduction)

TL;DR: We introduce Bregman-based algorithms for stochastic online bilevel optimization that eliminate condition-number dependence and show time-smoothed hypergradients reduce variance, validated on preconditioner learning and RL.

Abstract: We study *online bilevel optimization (OBO)* in the *stochastic* setting and ask whether geometry can eliminate the severe dependence on the condition number of the inner problem, $\kappa_g = \ell_{g,1}/\mu_g$. We introduce a family of *Bregman-based algorithms* and analyze both oracle and practical regimes. In the oracle setting, where exact hypergradients are available, generalized Bregman steps achieve sublinear bilevel local regret (i.e., $o(T)$) while *removing the cubic dependence on $\kappa_g$* incurred by Euclidean updates. In the practical stochastic setting, where hypergradients must be estimated, we design single-loop, sample-efficient algorithms that combine Bregman steps with time-smoothed hypergradient estimates. Our analysis shows that Bregman geometry again eliminates the $\kappa_g$-dependence and yields guarantees of sublinear bilevel local regret in this setting. It further reveals a broader insight: time smoothing, previously treated as a heuristic in deterministic OBO, naturally functions as a *variance-reduction mechanism* while keeping bias controlled, clarifying its role across both regimes. Finally, experiments on preconditioner learning and reinforcement learning support our theoretical findings across a variety of nonstationary loss sequences and large-scale, ill-conditioned datasets.

Primary Area: optimization

Submission Number: 20818

Loading