Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

Jikai Jin; Bohang Zhang; Haiyang Wang; Liwei Wang

Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

Jikai Jin, Bohang Zhang, Haiyang Wang, Liwei Wang

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: distributionally robust optimization, non-convex optimization, first order methods

Abstract: Distributionally robust optimization (DRO) is a widely-used approach to learn models that are robust against distribution shift. Compared with the standard optimization setting, the objective function in DRO is more difficult to optimize, and most of the existing theoretical results make strong assumptions on the loss function. In this work we bridge the gap by studying DRO algorithms for general smooth non-convex losses. By carefully exploiting the specific form of the DRO objective, we are able to provide non-asymptotic convergence guarantees even though the objective function is possibly non-convex, non-smooth and has unbounded gradient noise. In particular, we prove that a special algorithm called the mini-batch normalized gradient descent with momentum, can find an $\epsilon$-first-order stationary point within $\mathcal O(\epsilon^{-4})$ gradient complexity. We also discuss the conditional value-at-risk (CVaR) setting, where we propose a penalized DRO objective based on a smoothed version of the CVaR that allows us to obtain a similar convergence guarantee. We finally verify our theoretical results in a number of tasks and find that the proposed algorithm can consistently achieve prominent acceleration.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

TL;DR: We provide the first non-asymptotic analysis of non-convex distributionally robust optimization, bypassing the difficulty in the non-smoothness and unbounded gradient noise of the objective function.

12 Replies

Loading