Gradual Stochastic Gradient Descent: from signSGD to SGD via $\ell_p$ Norm

Jh Yuan; Liu Jiachen; Feiping Nie

Gradual Stochastic Gradient Descent: from signSGD to SGD via $\ell_p$ Norm

Jh Yuan, Liu Jiachen, Feiping Nie

Published: 02 Mar 2026, Last Modified: 16 Mar 2026Sci4DL 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Stochastic Gradient Descent, signSGD, SGD

Abstract: The research community has long sought an optimizer that converges as quickly as Adam in the early stage while achieving the strong generalization of SGD in the later stage. In this paper, we present a novel and feasible approach toward this goal. Recent studies have shown that Adam can be viewed as a smoothed version of sign Stochastic Gradient Descent (signSGD), i.e., the steepest descent under an $\ell_\infty$ norm ball constraint, whereas stochastic gradient descent can be regarded as the steepest descent under an $\ell_2$ norm ball. Inspired by this perspective, we propose Gradual Norm Optimization framework and design Gradual Stochastic Gradient Descent algorithm (GSGD), which enables the optimizer to smoothly transition from sign-based stochastic gradient descent in the early phase to standard stochastic gradient descent at the end. Gradual Stochastic Gradient Descent requires modifying only a single line of the original SGD implementation. We conduct preliminary evaluations of GSGD on Cifar-10 datasets, and the experimental results show that it exhibits fast convergence comparable to Adam and signSGD in the early stage, while retaining the generalization performance of SGD in the later stage.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Style Files: I have used the style files.

Submission Number: 2

Loading