Adaptive Client Clustering for Efficient Federated Learning Over Non-IID and Imbalanced Data

Biyao Gong, Tianzhang Xing, Zhidan Liu, Wei Xi, Xiaojiang Chen

Published: 2024, Last Modified: 04 Feb 2026IEEE Trans. Big Data 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Federated learning (FL) is an emerging distributed and privacy-preserving machine learning framework. However, the performance of traditional FL methods is seriously impaired by the real-world data, which appear to be non-independent and identically distributed (non-IID). The recent clustered federated learning (CFL) methods eliminate the impact of non-IID data by grouping clients with similar data distribution into the same cluster. Unfortunately, existing CFL methods heavily rely on the pre-setting of the cluster number, failing to achieve adaptive client clustering. Even worse, we experimentally observe that imbalanced data across clients largely degrade their correctness of client clustering. In this paper, we present a novel CFL method without manual intervention, named AutoCFL, which can eliminate both effects of non-IID and imbalanced data simultaneously. To deal with imbalanced data, the local training adjustment strategy adaptively adjusts the number of local training epochs for each client. To further improve the clustering correctness and adaptability, the weighted voting-based client clustering strategy automatically groups each client into an appropriate cluster. Extensive experiments are conducted to evaluate the design of AutoCFL with three popular datasets under various data settings. Experimental results demonstrate that AutoCFL outperforms the state-of-the-art methods under non-IID and imbalanced data settings, e.g., on average improving the model accuracy by $9.24\%$ when compared to the standard FL method, i.e., FedAvg, while significantly reducing communication costs by $4.67\times$ in an adaptive client clustering manner.

External IDs:dblp:journals/tbd/GongXLXC24