On the Convergence of Adam-Type Algorithms for Bilevel Optimization under Unbounded Smoothness

Xiaochuan Gong; Jie Hao; Mingrui Liu

On the Convergence of Adam-Type Algorithms for Bilevel Optimization under Unbounded Smoothness

Xiaochuan Gong, Jie Hao, Mingrui Liu

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Bilevel Optimization, Adam, Unbounded Smoothness

TL;DR: This paper design Adam-type algorithms for bilevel optimization under unbounded smoothness, with provable convergence guarantees.

Abstract: Adam has become one of the most popular optimizers for training modern deep neural networks, such as transformers. However, its applicability is largely restricted to single-level optimization problems. In this paper, we aim to extend vanilla Adam to tackle bilevel optimization problems, which have important applications in machine learning, such as meta-learning. In particular, we study stochastic bilevel optimization problems where the lower-level function is strongly convex and the upper-level objective is nonconvex with potentially unbounded smoothness. This unbounded smooth objective function covers a broad class of neural networks, including transformers, which may exhibit non-Lipschitz gradients. In this work, we first introduce AdamBO, a single-loop Adam-type method that achieves $\widetilde{O}(\epsilon^{-4})$ oracle complexity to find $\epsilon$-stationary points, where the oracle calls involve stochastic gradient or Hessian/Jacobian-vector product evaluations. The key to our analysis is a novel randomness decoupling lemma that provides refined control over the lower-level variable. Additionally, we propose VR-AdamBO, a variance-reduced version with an improved oracle complexity of $\widetilde{O}(\epsilon^{-3})$. The improved analysis is based on a novel stopping time approach and a careful treatment of the lower-level error. We conduct extensive experiments on various machine learning tasks involving bilevel formulations with recurrent neural networks (RNNs) and transformers, demonstrating the effectiveness of our proposed Adam-type algorithms.

Supplementary Material: zip

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12164

Loading