Abstract: Bayesian methods, distributionally robust optimization methods, and regularization methods are three pillars of machine learning under distributional uncertainty, e.g., the uncertainty of an empirical distribution compared to the true underlying distribution. This paper investigates the connections among the three frameworks and, in particular, explores why these frameworks tend to have smaller generalization errors. Specifically, first, we suggest a quantitative definition for "distributional robustness", propose the concept of "robustness measure", and formalize several philosophical concepts in distributionally robust optimization. Second, we show that Bayesian methods are distributionally robust in the probably approximately correct (PAC) sense; in addition, by constructing a Dirichlet-process-like prior in Bayesian nonparametrics, it can be proven that any regularized empirical risk minimization method is equivalent to a Bayesian method. Third, we show that generalization errors of machine learning models can be characterized using the distributional uncertainty of the nominal distribution and the robustness measures of these machine learning models; this explains the reason why distributionally robust optimization models, Bayesian models, and regularization models tend to have smaller generalization errors in a unified manner.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Fred_Roosta1
Submission Number: 4014
Loading