Abstract: This paper aims to achieve faster than O(1/t) convergence in federated learning for general
convex loss functions. Under the independent and identical distribution (IID) condition, we
show that accurate convergence to an optimal solution can be achieved in convex federated
learning even when individual clients select stepsizes locally without any coordination. More
importantly, this local stepsize strategy allows exploitation of the local geometry of individual
clients’ loss functions, and is shown to lead to faster convergence than the case where
a same universal stepsize is used for all clients. Then, when the distribution is non-IID,
we employ the sharing of gradients besides the global model parameter to ensure o(1/t)
convergence to an optimal solution in convex federated learning. For both algorithms, we
theoretically prove that stepsizes that are much larger than existing counterparts are allowed,
which leads to much faster convergence in empirical evaluations. It is worth noting
that, beyond providing a general framework for federated learning with drift correction, our
second algorithm’s achievement of o(1/t) convergence to the exact optimal solution under
general convex loss functions has not been previously reported in the federated learning
literature—except in certain restricted convex cases with additional constraints. We believe
that this is significant because even after incorporating momentum, existing first-order
federated learning algorithms can only ensure O(1/t) convergence for general convex loss
functions when no additional assumptions on heterogeneity are imposed.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Dan_Garber1
Submission Number: 6524
Loading