On last-iterate convergence of distributed Stochastic Gradient Descent algorithm with momentum

Difei Cheng; Ruinan Jin; Hong Qiao; Bo Zhang

On last-iterate convergence of distributed Stochastic Gradient Descent algorithm with momentum

Difei Cheng, Ruinan Jin, Hong Qiao, Bo Zhang

28 Sept 2024 (modified: 13 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: stochastic optimization, convergence analyse, distributed, momentum

TL;DR: In this paper, we aim to establish the last-iterate convergence theory for a class of distributed mSGD algorithms with a decaying learning rate.

Abstract: Distributed Stochastic Gradient optimization algorithms are studied extensively to address challenges in centralized approaches, such as data privacy, communication load, and computational efficiency, especially when dealing with large datasets. However, convergence theory research for these algorithms has been limited, particularly for distributed momentum-based SGD (mSGD) algorithms. Current theoretical work on distributed mSGD algorithms primarily focuses on establishing time-average convergence theory, whereas last-iterate convergence—considered a stronger and more practical definition than time-average convergence—has yet to be thoroughly explored. In this paper, we aim to establish the last-iterate convergence theory for a class of distributed mSGD algorithms with a decaying learning rate. First, we propose a general framework for distributed mSGD algorithms. Within this framework and under general conditions, we have proven the last-iterate convergence of the gradient of the loss function for a class of distributed mSGD algorithms. Furthermore, we have estimated the corresponding last-iterate convergence rate under supplementary conditions. Moreover, we theoretically prove that in the early stage, the adding of a momentum term can make the iterations converge more rapidly to a neighborhood of the stationary point. Some experiments are provided to illustrate the theoretical findings.

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13490

Loading