On the Convergence of Local SGD Under Third-Order Smoothness and Hessian Similarity

Published: 26 Oct 2023, Last Modified: 13 Dec 2023NeurIPS 2023 Workshop PosterEveryoneRevisionsBibTeX
Keywords: Distributed optmization; Local SGD; Hessian similarity
Abstract: Local SGD (i.e., Federated Averaging without client sampling) is widely used for solving federated optimization problems in the presence of heterogeneous data. However, there is a gap between the existing convergence rates for Local SGD and its observed performance on real-world problems. It seems that current rates do not correctly capture the effectiveness Local SGD. We first show that the existing rates for Local SGD in heterogeneous setting cannot recover the correct rate when the global function is a quadratic. Then we first derive a new rate for the case that the global function is a general strongly convex function depending on third-order smoothness and Hessian similarity. These additional parameters allow us to capture the problem in a more refined way and to overcome some of the limitations of the previous worst-case results derived under the standard assumptions. Then we show a rate for Local SGD when all clients are non-convex quadratic functions with identical Hessians.
Submission Number: 51
Loading