Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Streaming estimation, heavy tailed estimation, clipped SGD
TL;DR: near-Subgaussian convergence rates for streaming statistical estimation for clipped SGD in the presence of heavy-tailed data.
Abstract: $\newcommand{\Tr}{\mathsf{Tr}}$ We consider the problem of high-dimensional heavy-tailed statistical estimation in the streaming setting, which is much harder than the traditional batch setting due to memory constraints. We cast this problem as stochastic convex optimization with heavy tailed stochastic gradients, and prove that the widely used Clipped-SGD algorithm attains near-optimal sub-Gaussian statistical rates whenever the second moment of the stochastic gradient noise is finite. More precisely, with $T$ samples, we show that Clipped-SGD, for smooth and strongly convex objectives, achieves an error of $\sqrt{\frac{\Tr(\Sigma)+\sqrt{\Tr(\Sigma)\\|\Sigma\\|_2}\ln(\tfrac{\ln(T)}{\delta})}{T}}$ with probability $1-\delta$, where $\Sigma$ is the covariance of the clipped gradient. Note that the fluctuations (depending on $\tfrac{1}{\delta}$) are of lower order than the term $\Tr(\Sigma)$. This improves upon the current best rate of $\sqrt{\frac{\Tr(\Sigma)\ln(\tfrac{1}{\delta})}{T}}$ for Clipped-SGD, known \emph{only} for smooth and strongly convex objectives. Our results also extend to smooth convex and lipschitz convex objectives. Key to our result is a novel iterative refinement strategy for martingale concentration, improving upon the PAC-Bayes approach of \citet{catoni2018dimension}.
Primary Area: Optimization (convex and non-convex, discrete, stochastic, robust)
Submission Number: 2476
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview