## Accelerating Adaptive Federated Optimization with Local Gossip Communications

23 Sept 2022, 13:23 (modified: 29 Nov 2022, 19:40)FL-NeurIPS 2022 PosterReaders: Everyone
Keywords: Federated learning, Nonconvex optimization
Abstract: Recently, adaptive federated optimization methods, such as FedAdam and FedAMSGrad, have gained increasing attention for their fast convergence and stable performance especially in training models with heavy-tail stochastic gradient distributions. However, the implementation of such methods still faces several bottlenecks, such as the large client-to-server communication overhead and the intense sensitivity to heterogeneous data. More importantly, the two objectives may conflict with each other, i.e., the convergence rate gets worse as the number of local steps increases in the partial participation setting, making it challenging to further improve the efficiency of adaptive federated optimization. We refer this problem as the \textit{dilemma of local steps}. In this paper, we propose a novel hybrid adaptive federated optimization method (HA-Fed) where the clients are partitioned into disjoint clusters inside which they can communicate by fast client-to-client links. We show that HA-Fed resolves the \textit{dilemma of local steps} in prior adaptive federated optimization methods, i.e., achieves a faster convergence rate as the local steps increases, while reducing the client-to-server communication overhead under non-i.i.d. settings. Specifically, HA-Fed improves the convergence rate from $\mathcal{O}(\sqrt{\tau}/\sqrt{TM})$ in FedAMSGrad to $\mathcal{O}(1/\sqrt{T\tau M})$ in partial participation scenarios under nonconvex stochastic setting. Extensive experiments and ablation studies demonstrate the effectiveness and broad applicability of our proposed method.
Is Student: Yes
4 Replies