Communication-Efficient Distributionally Robust Decentralized Learning

Published: 05 Jan 2023, Last Modified: 28 Feb 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Decentralized learning algorithms empower interconnected devices to share data and computational resources to collaboratively train a machine learning model without the aid of a central coordinator. In the case of heterogeneous data distributions at the network nodes, collaboration can yield predictors with unsatisfactory performance for a subset of the devices. For this reason, in this work, we consider the formulation of a distributionally robust decentralized learning task and we propose a decentralized single loop gradient descent/ascent algorithm (AD-GDA) to directly solve the underlying minimax optimization problem. We render our algorithm communication-efficient by employing a compressed consensus scheme and we provide convergence guarantees for smooth convex and non-convex loss functions. Finally, we corroborate the theoretical findings with empirical results that highlight AD-GDA's ability to provide unbiased predictors and to greatly improve communication efficiency compared to existing distributionally robust algorithms.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. We modified the theory part, including a section in which we explained the differences between the existing distributionally robust algorithms and the proposed one. Furthermore, we highlighted the challenges that are peculiar to the derivation of AD-GDA’s convergence guarantees together with the technical tools employed. 2. We included Table 1 which compares the main features and convergence rates of the AD-GDA, DRFA and DR-DSGD. 3. We added a new Table in section 5.2.2 in which we reported the worst-node accuracy of AD-GDA, DR-DSGD and DRFA as requested. 4. We have replaced Figures 3 and 4 with convergence plots that highlight the predicted sub-linear rate of AD-GDA as well as the slope dependence on the compression levels and spectral gaps of the mixing matrices. 5. We have added more details about the experiments. 6. We have expanded the related work section and included the suggested work on stochastic games. 7. We have fixed the typos 8. Clarified the limitations of Assumption 3.4
Assigned Action Editor: ~Naman_Agarwal1
Submission Number: 473