Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data

Yuki Takezawa; Han Bao; Kenta Niwa; Ryoma Sato; Makoto Yamada

Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data

Yuki Takezawa, Han Bao, Kenta Niwa, Ryoma Sato, Makoto Yamada

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Decentralized Optimization, Non-Convex Stochastic Optimization, Momentum Acceleration

TL;DR: In this work, we propose Momentum Tracking, which is the method with momentum acceleration whose convergence rate is proved to be independent of the data-heterogeneity.

Abstract: SGD with momentum acceleration is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum acceleration is Distributed SGD (DSGD) with momentum acceleration (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momentum acceleration that are more robust to data heterogeneity than DSGDm, although their convergence rates remain dependent on data heterogeneity and decrease when the data distributions are heterogeneous. In this study, we propose Momentum Tracking, which is a method with momentum acceleration whose convergence rate is proven to be independent of data heterogeneity. More specifically, we analyze the convergence rate of Momentum Tracking in the standard deep learning setting, where the objective function is non-convex and the stochastic gradient is used. Then, we identify that it is independent of data heterogeneity for any momentum coefficient $\beta \in [0, 1)$. Through image classification tasks, we demonstrate that Momentum Tracking is more robust to data heterogeneity than the existing decentralized learning methods with momentum acceleration and can consistently outperform these existing methods when the data distributions are heterogeneous.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Optimization (eg, convex and non-convex optimization)

Supplementary Material: zip

34 Replies

Loading