Keywords: algorithmic stability, generalization bounds, excess risk bounds;
TL;DR: We derive high probability excess risk bounds to $O(1/n^2)$ for ERM, GD and SGD and our high probability results on the generalization error of gradients for nonconvex problems are also the sharpest.
Abstract: The sharpest known high probability excess risk bounds are up to $O(1/n)$ for empirical risk minimization and projected gradient descent via algorithmic stability [Klochkov and Zhivotovskiy, 2021]. In this paper, we show that high probability excess risk bounds of order up to $O(1/n^2)$ are possible. We discuss how high probability excess risk bounds reach $O(1/n^2)$ under strongly convexity, smoothness and Lipschitz continuity assumptions for empirical risk minimization, projected gradient descent and stochastic gradient descent. Besides, to the best of our knowledge, our high probability results on the generalization gap measured by gradients for nonconvex problems are also the sharpest.
Primary Area: Learning theory
Submission Number: 10780
Loading