Boosting Asynchronous Decentralized Learning with Model Fragmentation

Sayan Biswas; Anne-Marie Kermarrec; Alexis Marouani; Rafael Pires; Rishi Sharma; Martijn De Vos

Boosting Asynchronous Decentralized Learning with Model Fragmentation

Sayan Biswas, Anne-Marie Kermarrec, Alexis Marouani, Rafael Pires, Rishi Sharma, Martijn De Vos

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Track: Systems and infrastructure for Web, mobile, and WoT

Keywords: Decentralized Learning, Collaborative Machine Learning, Asynchronous Decentralized Learning, Communication Stragglers

TL;DR: DivShare is an asynchronous decentralized learning algorithm that disseminates fragmented models to mitigate communication delays from straggling nodes, leading to faster convergence and improved model utility compared to state-of-the-art baselines.

Abstract: Decentralized learning (DL) is an emerging technique that allows nodes on the web to collaboratively train machine learning models without sharing raw data. Dealing with stragglers, i.e., nodes with slower compute or communication than others, is a key challenge in DL. We present DivShare, a novel asynchronous DL algorithm that achieves fast model convergence in the presence of communication stragglers. DivShare achieves this by having nodes fragment their models into parameter subsets and send, in parallel to computation, each subset to a random sample of other nodes instead of sequentially exchanging full models. The transfer of smaller fragments allows more efficient usage of the collective bandwidth and enables nodes with slow network links to contribute with at least some of their model parameters quickly. By theoretically proving the convergence of DivShare, we provide, to the best of our knowledge, the first formal proof of convergence for a DL algorithm that accounts for the effects of asynchronous communication with delays. We experimentally evaluate DivShare against two state-of-the-art DL baselines, AD-PSGD and Swift, and with two standard datasets, CIFAR-10 and Movielens. We find that DivShare with communication stragglers lowers time-to-accuracy by up to 3.9x compared to AD-PSGD on the CIFAR-10 dataset. Compared to baselines, DivShare also achieves up to 19.4% better accuracy and 9.5% lower test loss on the CIFAR-10 and Movielens datasets, respectively.

Submission Number: 507

Loading