Adaptive Single-Pass Stochastic Gradient Descent in Input Sparsity TimeDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: stochastic gradient descent, streaming algorithm, stochastic optimization
Abstract: We study sampling algorithms for variance reduction methods for stochastic optimization. Although stochastic gradient descent (SGD) is widely used for large scale machine learning, it sometimes experiences slow convergence rates due to the high variance from uniform sampling. In this paper, we introduce an algorithm that approximately samples a gradient from the optimal distribution for a common finite-sum form with $n$ terms, while just making a single pass over the data, using input sparsity time, and $\tO{Td}$ space. Our algorithm can be implemented in big data models such as the streaming and distributed models. Moreover, we show that our algorithm can be generalized to approximately sample Hessians and thus provides variance reduction for second-order methods as well. We demonstrate the efficiency of our algorithm on large-scale datasets.
One-sentence Summary: We introduce a space and time efficient variance reduction method for stochastic gradient descent that can even be implemented in the distributed/streaming models.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=8SJEzZykuD
10 Replies

Loading