TSP: A Two-Sided Smoothed Primal-Dual Method for Nonconvex Bilevel Optimization

Songtao Lu

TSP: A Two-Sided Smoothed Primal-Dual Method for Nonconvex Bilevel Optimization

Songtao Lu

Published: 01 May 2025, Last Modified: 15 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This work proposes a stochastic first-order method with smoothing on both primal and dual variables, capable of finding KKT points for a broad class of nonconvex bilevel problems, with provable theoretical guarantees.

Abstract: Extensive research has shown that a wide range of machine learning problems can be formulated as bilevel optimization, where two levels of learning processes intertwine through distinct sets of optimization variables. However, prevailing approaches often impose stringent assumptions, such as strong convexity of the lower-level loss function or uniqueness of the optimal solution, to enable algorithmic development and convergence analysis. However, these assumptions tend to be overly restrictive in real-world scenarios. In this work, we explore a recently popularized Moreau envelope based reformulation of bilevel optimization problems, accommodating nonconvex objective functions at both levels. We propose a stochastic primal-dual method that incorporates smoothing on both sides, capable of finding Karush-Kuhn-Tucker solutions for this general class of nonconvex bilevel optimization problems. A key feature of our algorithm is its ability to dynamically weigh the lower-level problems, enhancing its performance, particularly in stochastic learning scenarios. Numerical experiments underscore the superiority of our proposed algorithm over existing penalty-based methods in terms of both the convergence rate and the test accuracy.

Lay Summary: Many machine learning problems involve two interconnected tasks, which can be formulated as bilevel optimization—for example, tuning a model while it’s still learning. Solving this class of problems are challenging, especially when both tasks are complex and may have multiple optimal solutions. We propose a new, efficient algorithm that handles such realistic, nonconvex settings without relying on the strict assumptions common in existing methods. By smoothing both levels and adopting a primal-dual optimization approach, our method finds high-quality solutions effectively, even in stochastic environments. Experiments show that our approach outperforms existing methods in both speed and accuracy, making it a powerful tool for solving advanced machine learning problems.

Primary Area: Optimization->Non-Convex

Keywords: Primal-dual method, bilevel optimization, stochastic gradient descent

Submission Number: 8902

Loading