Accelerating first-order methods for nonconvex-strongly-convex bilevel optimization under general smoothness
Keywords: Bilevel Optimization, Hölder Continuity, Accelerated Gradient Method
Abstract: Bilevel optimization is pivotal in machine learning applications such as hyperparameter tuning and adversarial training. While existing methods for nonconvex-strongly-convex bilevel optimization can find an $\epsilon$-stationary point under Lipschitz continuity assumptions, two critical gaps persist: improving algorithmic complexity and generalizing smoothness conditions. This paper addresses these challenges by introducing an accelerated framework under Hölder continuity—a broader class of smoothness that subsumes Lipschitz continuity.
We propose a restarted accelerated gradient method that leverages inexact hypergradient estimators and establishes theoretical oracle complexity for finding $\epsilon$-stationary points. Empirically, experiments on data hypercleaning and hyperparameter optimization demonstrate superior convergence rates compared to state-of-the-art baselines.
Supplementary Material: zip
Primary Area: Optimization (e.g., convex and non-convex, stochastic, robust)
Submission Number: 10228
Loading