Keywords: Online Bilevel Optimization, Bregman Geometry, Time-Smoothed Hypergradients (Variance Reduction)
TL;DR: We introduce Bregman-based algorithms for stochastic online bilevel optimization that eliminate condition-number dependence and show time-smoothed hypergradients reduce variance, validated on preconditioner learning and RL.
Abstract: We study *online bilevel optimization (OBO)* in the *stochastic* setting and ask whether geometry can eliminate the severe dependence on the condition number of the inner problem, $\kappa_g = \ell_{g,1}/\mu_g$. We introduce a family of *Bregman-based algorithms* and analyze both oracle and practical regimes. In the oracle setting, where exact hypergradients are available, generalized Bregman steps achieve sublinear bilevel local regret (i.e., $o(T)$) while *removing the cubic dependence on $\kappa_g$* incurred by Euclidean updates. In the practical stochastic setting, where hypergradients must be estimated, we design single-loop, sample-efficient algorithms that combine Bregman steps with time-smoothed hypergradient estimates. Our analysis shows that Bregman geometry again eliminates the $\kappa_g$-dependence and yields guarantees of sublinear bilevel local regret in this setting. It further reveals a broader insight: time smoothing, previously treated as a heuristic in deterministic OBO, naturally functions as a *variance-reduction mechanism* while keeping bias controlled, clarifying its role across both regimes. Finally, experiments on preconditioner learning and reinforcement learning support our theoretical findings across a variety of nonstationary loss sequences and large-scale, ill-conditioned datasets.
Primary Area: optimization
Submission Number: 20818
Loading