Revisiting Consensus Error: A Fine-grained Analysis of Local SGD under Second-order Data Heterogeneity
Keywords: Federated Averaging, Local SGD, Convergence Analysis, Communication Complexity, Data Heterogeneity
TL;DR: Local SGD converges faster under low second-order heterogeneity, and we prove it with tight bounds and supporting experiments.
Abstract: Local SGD, or Federated Averaging, is one of the most widely used algorithms for distributed optimization. Although it often outperforms alternatives such as mini-batch SGD, existing theory has not fully explained this advantage under realistic assumptions about data heterogeneity. Recent work has suggested that a second-order heterogeneity assumption may suffice to justify the empirical gains of local SGD. We confirm this conjecture by establishing new upper and lower bounds on the convergence of local SGD. These bounds demonstrate how a low second-order heterogeneity, combined with third-order smoothness, enables local SGD to interpolate between heterogeneous and homogeneous regimes while maintaining communication efficiency. Our main technical contribution is a refined analysis of the consensus error, a central quantity in such results. We validate our theory with experiments on a distributed linear regression task.
Supplementary Material: zip
Primary Area: Optimization (e.g., convex and non-convex, stochastic, robust)
Submission Number: 26560
Loading