On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning

Kumar Kshitij Patel; Margalit Glasgow; Lingxiao Wang; Nirmit Joshi; Nathan Srebro

On the Still Unreasonable Effectiveness of Federated Averaging for Heterogeneous Distributed Learning

Kumar Kshitij Patel, Margalit Glasgow, Lingxiao Wang, Nirmit Joshi, Nathan Srebro

Published: 19 Jun 2023, Last Modified: 21 Jul 2023FL-ICML 2023EveryoneRevisionsBibTeX

Keywords: Local SGD, Federated Averaging, Client Heterogeneity, Lower Bounds, Convergence Theory

TL;DR: We provide new lower bound results that underline the gaps between the theory and practice of using federated averaging.

Abstract: Federated Averaging/local SGD is the most common optimization method for federated learning that has proven effective in many real-world applications, dominating simple baselines like mini-batch SGD for convex and non-convex objectives. However, theoretically showing the effectiveness of local SGD remains challenging, posing a huge gap between theory and practice. In this paper, we provide new lower bounds for local SGD for convex objectives, ruling out proposed heterogeneity assumptions that try to capture this \textit{``unreasonable"} effectiveness of local SGD. We further show that accelerated mini-batch SGD is, in fact, min-max optimal under some of these heterogeneity notions. Our results indicate that strong convexity of a client's objective might be necessary to utilize several heterogeneity assumptions. This also highlights the need for new heterogeneity assumptions for federated optimization for the general convex setting, and we discuss some alternative assumptions.

Submission Number: 99

Loading