DASGrad: Double Adaptive Stochastic Gradient

Kin Gutierrez; Cristian Challu; Jin Li; Artur Dubrawski

DASGrad: Double Adaptive Stochastic Gradient

Kin Gutierrez, Cristian Challu, Jin Li, Artur Dubrawski

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: stochastic convex optimization, adaptivity, online learning, transfer learning

TL;DR: Stochastic gradient descent with adaptive moments and adaptive probabilities

Abstract: Adaptive moment methods have been remarkably successful for optimization under the presence of high dimensional or sparse gradients, in parallel to this, adaptive sampling probabilities for SGD have allowed optimizers to improve convergence rates by prioritizing examples to learn efficiently. Numerous applications in the past have implicitly combined adaptive moment methods with adaptive probabilities yet the theoretical guarantees of such procedures have not been explored. We formalize double adaptive stochastic gradient methods DASGrad as an optimization technique and analyze its convergence improvements in a stochastic convex optimization setting, we provide empirical validation of our findings with convex and non convex objectives. We observe that the benefits of the method increase with the model complexity and variability of the gradients, and we explore the resulting utility in extensions to transfer learning.

Code: https://github.com/kdgutier/dasgrad

Original Pdf: pdf

6 Replies

Loading