adaStar: A Method for Adapting to InterpolationDownload PDF

Published: 23 Nov 2022, Last Modified: 05 May 2023OPT 2022 PosterReaders: Everyone
Abstract: Stochastic convex optimization methods are much faster at minimizing \textit{interpolation problems}---problems where all sample losses share a common minimizer---than non-interpolating problems. However, standard non-adaptive stochastic gradient methods require step sizes tailored for the interpolation setting, which are sub-optimal for non-interpolating problems, to attain these fast rates. This is problematic because verifying whether a problem is interpolating, without minimizing it, is difficult. Moreover, because interpolation is not a stable property---small changes to the data distribution can transform an interpolation problem into a non-interpolating one---we would like our methods to get the fast interpolation rate when it can, while being robust to these perturbations. Stochastic gradient methods with adaptive step sizes are able to achieves these two desiderata in expectation [Orabona 2019]. In this work, we build on these ideas and present adaStar, an adaptive stochastic gradient method which---with high probability---attains the optimal, fast rate on smooth interpolation problems (up to log factors) and gracefully degrades with the minimal objective value for non-interpolating problems. This high probability result is crucial for our second result, where we use adaStar as a building block to construct another stochastic gradient method, termed adaStar-G, which adapts to interpolation and growth conditions, getting even faster rates.
0 Replies