Impact of Agent Behavior in Distributed SGD and Federated Learning

Jie Hu; Do Young Eun

Impact of Agent Behavior in Distributed SGD and Federated Learning

Jie Hu, Do Young Eun

15 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Distributed Optimization, Sampling Strategy, Distributed SGD, Federated Learning, Central Limit Theorem

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We provide an asymptotic analysis of the generalized distributed SGD, including distributed SGD and its variants in Federated Learning, to study the impact of agents' sampling strategies on the overall convergence speed.

Abstract: Distributed learning has gained significant interest recently as it allows for the training of machine learning models across a set of *heterogeneous* agents in a privacy-preserving manner with the growing amount of distributed data. In this paper, we conduct an asymptotic analysis of Generalized Distributed SGD (GD-SGD) under various communication patterns among agents, including Distributed SGD (D-SGD) and its variants in Federated Learning (FL), as well as the increasing communication interval in the FL setting. We examine the influence of agents' sampling strategies, such as *i.i.d.* sampling, shuffling methods and Markovian sampling, on the overall convergence speed of GD-SGD. We prove that all agents will asymptotically reach consensus and identify the optimal model parameter, while also analyzing the impact of sampling strategies on the limiting covariance matrix that appears in the Central Limit Theorem (CLT). Our results theoretically and empirically support recent findings on linear speedup and asymptotic network independence, and generalize previous findings on the efficient Markovian sampling strategies from vanilla SGD to GD-SGD. Overall, our results provide a deeper understanding of the convergence speed of GD-SGD and emphasize the role of *each* agent's sampling strategy, moving beyond a focus on the worst-case agent commonly found in existing literature.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 412

Loading