Abstract: Wasserstein distributionally robust optimization (DRO) is an approach to optimization under uncertainty in which the decision maker hedges against a set of probability distributions, specified by a Wasserstein ball, for the uncertain parameters. This approach facilitates robust machine learning, resulting in models that sustain good performance when the data are to some extent different from the training data. This robustness is related to the well-studied effect of regularization. The connection between Wasserstein DRO and regularization has been established in several settings. However, existing results often require restrictive assumptions, such as smoothness or convexity, that are not satisfied by many important problems. In this paper, we develop a general theory for the variation regularization effect of the Wasserstein DRO—a new form of regularization that generalizes total-variation regularization, Lipschitz regularization, and gradient regularization. Our results cover possibly nonconvex and nonsmooth losses and losses on non-Euclidean spaces and highlight the bias-variation tradeoff intrinsic in the Wasserstein DRO, which balances between the empirical mean of the loss and the variation of the loss. Example applications include multi-item newsvendor, linear prediction, neural networks, manifold learning, and intensity estimation for Poisson processes. We also use our theory of variation regularization to derive new generalization guarantees for adversarial robust learning. Funding: X. Chen is supported by the National Science Foundation [Grant IIS-1845444]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2383.
External IDs:doi:10.1287/opre.2022.2383
Loading