Keywords: stochastic algorithm, distributionally robust optimization, Sinkhorn distance
TL;DR: We propose a new dual formulation of Sinkhorn distance regularized Distributionally Robust Optimization, design and analyze the mathematical property of nested algorithm framework based on stochastic gradient descent method.
Abstract: Distributionally Robust Optimization (DRO) is a powerful modeling technique to tackle the challenge
caused by data distribution shifts. This paper focuses on Sinkhorn distance regularized DRO.
We generalize Sinkhorn distance allowing broader function choices to model ambiguity set and
derive the lagrangian dual taking the form of nested stochastic programming. We also design the
algorithm based on stochastic gradient descent with easy-to-implement constant learning rate. Unlike
previous work doing algorithm analysis for convex and bounded loss function, our algorithm
provides convergence guarantee for non-convex and possible unbounded loss function under proper
choice of sampling batch-size. The resultant sample complexity for finding $\epsilon$-stationary point reveals
independent relationship with data size and parameter dimension, and thus our modeling and
algorithms are suitable for large-scale applications.
Submission Number: 22
Loading