Reducing Communication in Nonconvex Federated Learning with a Novel Single-Loop Variance Reduction Method
Keywords: variance reduction, federated learning
TL;DR: With a novel single-loop variance reduction method, we controlled client-sampling error in nonconvex FL and achieved improved convergence rates.
Abstract: In Federated Learning (FL), inter-client heterogeneity causes two types of errors: (i) \emph{client drift error} which is induced by multiple local updates, (ii) \emph{client sampling error} due to partial participation of clients at each communication. While several solutions have been offered to the former one, there is still much room of improvement on the latter one.
We provide a fundamental solution to this client sampling error. The key is a novel single-loop variance reduction algorithm, SLEDGE (Single-Loop mEthoD for Gradient Estimator), which does not require periodic computation of full gradient but achieves near-optimal gradient complexity in the nonconvex finite-sum setting. While sampling a small number of clients at each communication round, the proposed FL algorithm, FLEDGE, requires provably fewer or at least equivalent communication rounds compared to any existing method, for finding first and even second-order stationary points in the general nonconvex setting, and under the PL condition. Moreover, under less Hessian-heterogeneity between clients, the required number of communication rounds approaches to $\tilde{\Theta}(1)$.
0 Replies
Loading