Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg

Like Jian; Dong Liu

Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg

Like Jian, Dong Liu

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Federated learning (FL) enables decentralized clients to train a model collaboratively without sharing local data. A key distinction between FL and centralized learning is that clients' data are non-independent and identically distributed, which poses significant challenges in training a global model that generalizes well across heterogeneous local data distributions. In this paper, we analyze the convergence of overparameterized FedAvg with gradient descent (GD). We prove that the impact of data heterogeneity diminishes as the width of neural networks increases, ultimately vanishing when the width approaches infinity. In the infinite-width regime, we further prove that both the global and local models in FedAvg behave as linear models, and that FedAvg achieves the same generalization performance as centralized learning with the same number of GD iterations. Extensive experiments validate our theoretical findings across various network architectures, loss functions, and optimization methods.

Lay Summary: Training artificial intelligence (AI) usually requires collecting vast amounts of data in one place, raising privacy concerns. Federated learning solves this problem by allowing multiple devices to jointly train a shared AI model without ever sharing their private data. However, because data on each device can be very different, it is difficult for federated learning models to perform equally well across all devices. We studied how large neural networks handle this data heterogeneity in federated learning. Our analysis shows something surprising: as neural networks become wider, the impact of data heterogeneity shrinks significantly and eventually disappears completely for infinitely wide networks. In fact, infinitely wide neural networks trained via federated learning perform just as well as traditional AI models trained centrally. Our findings help clarify how federated learning can achieve strong performance despite challenging data conditions, paving the way for more effective and privacy-friendly AI applications.

Link To Code: https://github.com/kkhuge/ICML2025

Primary Area: Deep Learning->Theory

Keywords: Federated learning, overparameterized neural networks, data heterogeneity

Submission Number: 5803

Loading