Rethinking Moreau Envelope for Nonconvex Bi-Level Optimization: A Single-loop and Hessian-free Solution Strategy

Risheng Liu; Zhu Liu; Wei Yao; Shangzhi Zeng; Jin Zhang

Rethinking Moreau Envelope for Nonconvex Bi-Level Optimization: A Single-loop and Hessian-free Solution Strategy

Risheng Liu, Zhu Liu, Wei Yao, Shangzhi Zeng, Jin Zhang

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Supplementary Material: zip

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Bi-level Optimization, Nonconvex, Hessian-free, Single-loop, Moreau Envelope, Convergence Analysis, Non-asymptotic

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We introduce an innovative single-loop Hessian-free algorithm with non-asymptotic convergence guarantees for general nonconvex bi-level optimization problems.

Abstract: Bi-Level Optimization (BLO) has found diverse applications in machine learning due to its ability to model nested structures. Addressing large-scale BLO problems for complex learning tasks presents two significant challenges: ensuring computational efficiency and providing theoretical guarantees. Recent advancements in scalable BLO algorithms has predominantly relied on lower-level convexity simplification. In this context, our work takes on the challenge of large-scale BLO problems involving nonconvexity in both the upper and lower levels. We address both computational and theoretical challenges simultaneously. Specifically, by utilizing the Moreau envelope-based reformulation, we introduce an innovative single-loop gradient-based algorithm with non-asymptotic convergence analysis for general nonconvex BLO problems. Notably, this algorithm relies solely on first-order gradient information, making it exceedingly practical and efficient, particularly for large-scale BLO learning tasks. We validate the effectiveness of our approach on a series of different synthetic problems, two typicial hyper-parameter learning tasks and the real-world neural architecture search application. These experiments collectively substantiate the superior performance of our method.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3145

Loading