High Probability Guarantees for Random Reshuffling

Published: 23 Oct 2023, Last Modified: 27 Nov 2023HeavyTails 2023EveryoneRevisionsBibTeX
Keywords: random reshuffling, shuffled SGD, high-probability sample complexity, stopping criterion, the last iteration result
TL;DR: We present a high-probability complexity analysis for SGD with random reshuffling, without any assumptions on the stochastic gradient errors.
Abstract: We study the probabilistic behaviors of stochastic gradient descent with random reshuffling (RR) on nonconvex problems. We prove that the same complexity (except for a logrithmic term) as that of in expectation case also holds with high probability, which characterizes the performance of RR for a single run instead of averaging infinitely many realizations. Our analysis does not impose any additional assumptions on the stochastic gradient errors, which admits heavy tails. This is in contrast to high probabiltiy analyses of SGD that rely on sub-Gaussian stochastic gradient errors or tricks like clipping, momentum, etc. Furthermore, leveraging the established high probability error bounds, we propose a simple stopping criterion for RR that introduces few computational costs. We prove that the function value strictly decreases with a high probability before the stopping criterion is triggered, ensuring that the criterion will indeed be activated. Finally, a "last iterate'' result is built for the iteration returned with this stopping criterion. We believe that our new developments for RR serve as a stepping stone towards enabling more refined high probability analyses for characterizing its performance.
Submission Number: 20