Tackling Data Heterogeneity in Federated Learning through Global Density Estimation

Sagnik Ghosh, Avinash Kushwaha, Dinesh Singh

Published: 2024, Last Modified: 25 Sept 2025IEEE Big Data 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Federated learning is gaining popularity for its largely accepted paradigm of privacy protection. However, data heterogeneity among clients is often the primary challenge in federated learning, hindering the convergence of deep neural networks. The non-IID nature of data across clients escalates the computational cost and communication overhead for models trained locally on-device and shared for global averaging. To mitigate this issue, we try to preserve the statistical parameters of local clients by estimating the global density using Gaussian mixture model. Our study primarily focuses on preserving client data privacy while addressing the statistical heterogeneity in data distribution across all the clients. A federated implementation of the distribution-preserving sampling algorithm, FedDpS is put forward to mitigate the high heterogeneity of data among clients thereby facilitating the training of DNN models for faster convergence. Local models are built at the client level using our on-device algorithm designed to tackle data heterogeneity among clients. Significant improvements in test accuracy, F1-score, and other evaluation metrics have been observed when trained using state-of-the-art optimization models like FedAvg, FedProx, FedAdam, FedAwS, and MOON. Our proposed method achieves the target performance in fewer communication rounds thereby reducing the overall communication cost. The code for our implementation can be found at https://github.com/sagnik04g/FedDpS.

External IDs:dblp:conf/bigdataconf/GhoshK024