Stochastic Reweighted Gradient Descent

Ayoub El Hanchi; Chris J. Maddison; David Alan Stephens

Stochastic Reweighted Gradient Descent

Ayoub El Hanchi, Chris J. Maddison, David Alan Stephens

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Stochastic gradient descent, Finite-sum optimization, Variance reduction, Importance sampling

Abstract: Importance sampling is a promising strategy for improving the convergence rate of stochastic gradient methods. It is typically used to precondition the optimization problem, but it can also be used to reduce the variance of the gradient estimator. Unfortunately, this latter point of view has yet to lead to practical methods that improve the asymptotic error of stochastic gradient methods. In this work, we propose stochastic reweighted gradient (SRG), a variance-reduced stochastic gradient method based solely on importance sampling that can improve on the asymptotic error of stochastic gradient descent (SGD) in the strongly convex and smooth case. We show that SRG can be extended to combine the benefits of both importance-sampling-based preconditioning and variance reduction. When compared to SGD, the resulting algorithm can simultaneously reduce the condition number and the asymptotic error, both by up to a factor equal to the number of component functions. We demonstrate improved convergence in practice on $\ell_2$-regularized logistic regression problems.

One-sentence Summary: We introduce the first importance-sampling-based variance reduction algorithm for finite-sum optimization with convergence rate guarantees under standard assumptions.

Supplementary Material: zip

17 Replies

Loading