Abstract: Federated learning (FL) learns a model jointly from a set of participating devices without
sharing each other’s privately held data. The characteristics of non-i.i.d. data across
the network, low device participation, high communication costs, and the mandate that
data remain private bring challenges in understanding the convergence of FL algorithms,
particularly regarding how convergence scales with the number of participating devices. In
this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective
FL algorithms in use today, and conduct a systematic study of how its convergence scales with
the number of participating devices under non-i.i.d. data and partial participation in convex
settings. We provide a unified analysis that establishes convergence guarantees for FedAvg
under strongly convex, convex, and overparameterized strongly convex problems. We show
that FedAvg enjoys linear speedup in each case, although with different convergence rates and
communication efficiencies. For strongly convex and convex problems, we also characterize
the corresponding convergence rates for the Nesterov accelerated FedAvg algorithm, which
are the first linear speedup guarantees for momentum variants of FedAvg in convex settings.
Empirical studies of the algorithms in various settings have supported our theoretical results.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Sebastian_U_Stich1
Submission Number: 542
Loading