Optimized Federated Learning on Class-Biased Distributed Data Sources

Yongli Mou, Jiahui Geng, Sascha Welten, Chunming Rong, Stefan Decker, Oya Beyan

Published: 2021, Last Modified: 07 Mar 2025PKDD/ECML Workshops (1) 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Due to privacy protection, the conventional machine learning approaches, which upload all data to a central location, has become less feasible. Federated learning, a privacy-preserving distributed machine learning paradigm, has been proposed as a solution to comply with privacy requirements. By enabling multiple clients collaboratively to learn a shared global model, model parameters instead of local private data will be exchanged under privacy restrictions. However, compared with centralized approaches, federated learning suffers from performance degradation when trained on non-independently and identically distributed (non-i.i.d.) data across the participants. Meanwhile, the class imbalance problem is always encountered in machine learning in practice and causes bad prediction on minority classes. In this work, We propose FedBGVS to alleviate the class bias severity by employing a balanced global validation set. The model aggregation algorithm is refined by using the Balanced Global Validation Score (BGVS). We evaluate our methods by experiments conducted on both the classical benchmark datasets MNIST, SVHN and CIFAR-10 and a public clinical dataset ISIC-2019. The empirical results demonstrate that our proposed methods outperform the state-of-the-art federated learning algorithms in label distribution skew and class imbalance settings.