Keywords: Decentralized FL, Communication topology, Heterogeneous data, Graph learning
TL;DR: We provide a refined analysis of decentralized SGD, which highlights the interplay between the topology and data heterogeneity and offers a natural criterion for learning a good topology
Abstract: One of the key challenges in decentralized and federated learning is to design algorithms that efficiently deal with highly heterogeneous data distributions across agents. In this paper, we revisit the analysis of Decentralized Stochastic Gradient Descent algorithm (D-SGD) under data heterogeneity. We first exhibit the key role played by a new quantity, called neighborhood heterogeneity, on the convergence rate of D-SGD. Neighborhood heterogeneity provides a natural criterion to learn data-dependent and sparse topologies that reduce the detrimental effect of data heterogeneity on the convergence of D-SGD. For the important case of classification with label skew, we formulate the problem of learning a topology as a tractable optimization problem that we solve with a Frank-Wolfe algorithm. As illustrated over a set of experiments, the learned sparse topology is showed to balance the convergence speed and the per-iteration communication costs of D-SGD.
Is Student: No