General Weighted Averaging in Stochastic Gradient Descent: CLT and Adaptive Optimality

Published: 03 Feb 2026, Last Modified: 03 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Stochastic Gradient Descent (SGD) is a cornerstone of machine learning, prized for its efficiency in large-scale optimization. This paper revisits SGD by introducing a general weighted averaging framework that significantly enhances its applicability. We establish asymptotic normality for a wide range of weighted averaged SGD solutions under minimal assumptions, providing a groundbreaking necessary condition for the central limit theorem in certain settings. This enables asymptotically valid online inference, empowering real-time confidence interval construction. Furthermore, we propose an adaptive averaging scheme, inspired by optimal weights for linear models, which achieves optimal superior non-asymptotic bounds. Our theoretical advances and empirical validations redefine SGD’s capabilities, offering transformative insights for statistical learning and optimization.
Submission Number: 534
Loading