Keywords: convex federated learning, convergence rate
Abstract: This paper aims to achieve faster than $O(1/t)$ convergence in federated learning for general convex loss functions. Under the independent and identical distribution (IID) condition, we show that accurate convergence to an optimal solution can be achieved in convex federated learning even when individual clients select stepsizes locally without any coordination. More importantly, this local stepsize strategy allows exploitation of the local geometry of individual clients' loss functions, and is shown to lead to faster convergence than the case where a same universal stepsize is used for all clients. Then, when the distribution is non-IID, we employ the sharing of gradients besides the global model parameter to ensure $o(1/t)$ convergence to an optimal solution in convex federated learning. For both algorithms, we theoretically prove that stepsizes that are much larger than existing counterparts are allowed, which leads to much faster convergence in empirical evaluations.  It is worth noting that, beyond providing a general framework for federated learning with drift correction, our second algorithm’s achievement of $ 
o(1/t)$ convergence to the exact optimal solution under general convex loss functions has not been previously reported in the federated learning literature—except in certain restricted convex cases with additional constraints.
 We believe that this is significant because even after incorporating momentum, existing first-order federated learning algorithms  can only ensure $O(1/t)$ convergence for general convex loss functions when no additional assumptions on heterogeneity are imposed.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 12675
Loading