Stochastic Smoothed Primal-Dual Algorithms for Nonconvex Optimization with Linear Inequality Constraints
TL;DR: We develop optimal single-loop primal-dual algorithms for stochastic nonconvex optimization with linear inequality constraints.
Abstract: We propose smoothed primal-dual algorithms for solving stochastic nonconvex optimization problems with linear \emph{inequality} constraints. Our algorithms are single-loop and only require a single (or two) samples of stochastic gradients at each iteration. A defining feature of our algorithm is that it is based on an inexact gradient descent framework for the Moreau envelope, where the gradient of the Moreau envelope is estimated using one step of a stochastic primal-dual (linearized) augmented Lagrangian algorithm. To handle inequality constraints and stochasticity, we combine the recently established global error bounds in constrained optimization with a Moreau envelope-based analysis of stochastic proximal algorithms. We establish the optimal (in their respective cases) $O(\varepsilon^{-4})$ and $O(\varepsilon^{-3})$ sample complexity guarantees for our algorithms and provide extensions to stochastic linear constraints. Unlike existing methods, iterations of our algorithms are free of subproblems, large batch sizes or increasing penalty parameters in their iterations and they use dual variable updates to ensure feasibility.
Lay Summary: Current machine learning systems train neural networks with constraints, which can be for example safety limits or a desired functionality from the network. These problems are modeled by what is referred to as "constrained optimization problems". In particular, in modern machine learning, due to the structure and large size of the neural networks, vast amount of data, these problems lack the property called "convexity" and the algorithms used in practice are "stochastic", that is, they use only a fraction of the available data at every iteration.
Our paper focuses on a special case of the above problem, one where the constraints are given as linear functions. We propose and theoretically analyze algorithms that are similar to ones used in practice and provide guarantees on the amount of computational resources they need to give us an "approximately good" point in terms of solving our problem. These guarantees are of the same order as the best possible guarantees that can be achievable by algorithms of the type we analyze for this problem.
Theoretical guarantees for algorithms ensure practitioners that the method they use to solve a problem will behave correctly in practice. Moreover, this also guides the design of faster algorithms that will require less computational resources to output a point that is as good as more computationally heavy algorithms.
Primary Area: Optimization->Non-Convex
Keywords: primal-dual algorithms, stochastic optimization, single-loop algorithms, linear inequality constraints
Submission Number: 14618
Loading