Keywords: decntralized optimization, gossip averaging, spectral gap
TL;DR: We improve the convergence analysis of Decentralized SGD and reval how the topology affects the convergence rate more precisely.
Abstract: Decentralized SGD is a fundamental algorithm in decentralized learning, although the influence of an underlying network topology on its convergence behavior is not yet fully understood. Existing convergence analyses have shown that topologies with a small spectral gap significantly deteriorate the convergence rate of Decentralized SGD in both homogeneous and heterogeneous cases. However, many prior papers have reported that indeed the choice of the topology has a significant experimental impact in the heterogeneous case, but has little experimental impact on training behavior in the homogeneous case. In this paper, we present a tighter convergence analysis of Decentralized SGD for both convex and non-convex cases, offering a more precise understanding of how topologies affect the convergence rate than the prior analysis. Specifically, unlike existing convergence analyses that used only the spectral gap as a property of the topology, our novel analysis shows that all eigenvalues of the mixing matrix affect the convergence rate. This leads to the key insight that in homogeneous settings, the effect of topology on the convergence rate is notably smaller than expected from the existing analyses, especially for commonly used topologies, such as the ring and torus. Throughout the experiments, we carefully evaluated the convergence behavior of Decentralized SGD and demonstrated that our novel convergence analysis can more accurately describe the effect of topology on the convergence rate.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 15266
Loading