Keywords: stochastic optimization, nonsmooth optimization, robust learning
TL;DR: We apply the stochastic prox-linear method to CVaR minimization, and show that it works for a larger range of step sizes than stochastic subgradient.
Abstract: We develop an instance of the stochastic prox-linear method for minimizing the Conditional Value-at-Risk (CVaR) objective. CVaR is a risk measure focused on minimizing worst-case performance, defined as the average of the top quantile of the losses. In machine learning, such a risk measure is useful to train more robust models. Although the stochastic subgradient method (SGM) is a natural choice for minimizing CVaR objective, we show that the prox-linear algorithm can be used to better exploit the structure of the objective, while still providing a convenient closed form update. We then specialize a general convergence theorem for the prox-linear method to our setting, and show that it allows for a wider selection of step sizes compared to SGM. We support this theoretical finding experimentally, by showing that the performance of stochastic prox-linear is more robust to the choice of step size compared to SGM.