GuardFL: Safeguarding Federated Learning Against Backdoor Attacks via Attributed Client Graph Clustering

Hao Yu, Chuan Ma, Meng Liu, Tianyu Du, Ming Ding, Tao Xiang, Shouling Ji, Xinwang Liu

Published: 2026, Last Modified: 25 Mar 2026IEEE Trans. Inf. Forensics Secur. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Federated Learning (FL) offers collaborative model training across multiple decentralized devices without the need to share data directly, enhancing privacy and data security. However, FL systems are susceptible to backdoor attacks, where malicious clients inject poisoned weights during training. Existing defenses, primarily based on anomaly detection, are prone to erroneous rejections of normal weights while accepting poisoned ones, largely due to shortcomings in quantifying similarities among client models. Furthermore, other defenses demonstrate effectiveness only when dealing with a limited number of malicious clients, typically fewer than 10%. To alleviate these vulnerabilities, we present G2uardFL, a protective framework that translates the detection of malicious clients into an attributed graph clustering problem, thus safeguarding FL systems. Specifically, this framework employs a client graph clustering approach to identify malicious clients and integrates an adaptive mechanism to amplify the discrepancy between the aggregated model and the poisoned ones, effectively eliminating embedded backdoors. Through empirical evaluation, comparing G2uardFL with cutting-edge defenses, such as FLAME (USENIX Security 2022) and DeepSight (NDSS 2022), against various backdoor attacks, including 3DFed (SP 2023), our results demonstrate its significant effectiveness in mitigating backdoor attacks while having a negligible impact on the aggregated model’s performance on benign samples (i.e., the primary task performance). For instance, in an FL system with 25% malicious clients, G2uardFL reduces the attack success rate to 10.61%, while maintaining a primary task performance of 80.98% on the CIFAR-10 dataset. This surpasses the performance of the best-performing baseline, which merely achieves the attack success rate of 19.54%.

External IDs:dblp:journals/tifs/YuMLDDXJL26