A New Theoretical Perspective on Data Heterogeneity in Federated Optimization

Jiayi Wang; Shiqiang Wang; Rong-Rong Chen; Mingyue Ji

A New Theoretical Perspective on Data Heterogeneity in Federated Optimization

Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, Mingyue Ji

Published: 19 Jun 2023, Last Modified: 21 Jul 2023FL-ICML 2023EveryoneRevisionsBibTeX

Keywords: Federated Learning, Data Heterogeneity, Theoretical Analysis

TL;DR: This paper bridges the gap between the pessimistic theoretical results and the good experimental results for federated algorithms by introducing a new perspective of the data heterogeneity.

Abstract: In federated optimization, data heterogeneity is the main reason that existing theoretical analyses are pessimistic about the convergence error caused by local updates. However, experimental results have shown that more local updates can improve the convergence rate and reduce the communication cost when data are heterogeneous. This paper bridges this gap between theoretical understanding and the practical performance by providing a general theoretical analysis for federated averaging (FedAvg) with non-convex objective functions from a new perspective on data heterogeneity. Identifying the limitations in the commonly used assumption of bounded gradient divergence, we propose a new assumption, termed the heterogeneity-driven Lipschitz assumption, which characterizes the fundamental effect of data heterogeneity on local updates. We find the widely used local Lipschitz constant is affected by data heterogeneity, which is neglected in the literature. The proposed heterogeneity-driven Lipschitz constant can capture the information about data heterogeneity contained in local Lipschitz constant. At the same time, the information about the gradient smoothness is captured by the global Lipschitz assumption. Based on the new assumption, we derive novel convergence bounds for both full participation and partial participation, which are tighter and show that more local updates can improve the convergence rate even when data are highly heterogeneous. Furthermore, the assumptions used in this paper are weaker than those used in the literature.

Submission Number: 75

Loading