Momentum Ensures Convergence of SIGNSGD under Weaker Assumptions

Tao Sun; Qingsong Wang; Dongsheng Li; Bao Wang

Momentum Ensures Convergence of SIGNSGD under Weaker Assumptions

Tao Sun, Qingsong Wang, Dongsheng Li, Bao Wang

Published: 24 Apr 2023, Last Modified: 21 Jun 2023ICML 2023 PosterEveryoneRevisions

Abstract: Sign Stochastic Gradient Descent (signSGD) is a communication-efficient stochastic algorithm that only uses the sign information of the stochastic gradient to update the model's weights. However, the existing convergence theory of signSGD either requires increasing batch sizes during training or assumes the gradient noise is symmetric and unimodal. Error feedback has been used to guarantee the convergence of signSGD under weaker assumptions at the cost of communication overhead. This paper revisits the convergence of signSGD and proves that momentum can remedy signSGD under weaker assumptions than previous techniques; in particular, our convergence theory does not require the assumption of bounded stochastic gradient or increased batch size. Our results resonate with echoes of previous empirical results where, unlike signSGD, signSGD with momentum maintains good performance even with small batch sizes. Another new result is that signSGD with momentum can achieve an improved convergence rate when the objective function is second-order smooth. We further extend our theory to signSGD with major vote and federated learning.

Submission Number: 426

Loading