FedClean: A General Robust Label Noise Correction for Federated Learning

Xiaoqian Jiang; Jing Zhang

FedClean: A General Robust Label Noise Correction for Federated Learning

Xiaoqian Jiang, Jing Zhang

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Many federated learning scenarios encounter label noises in the client-side datasets. The resulting degradation in global model performance raises the urgent need to address label noise. This paper proposes FedClean -- a novel general robust label noise correction for federated learning. FedClean first uses the local centralized noisy label learning to select clean samples to train a global model. Then, it employs a two-stage correction scheme to correct the noisy labels from two distinct perspectives of local noisy label learning and the global model. FedClean also proposes a novel model aggregation method, further reducing the impact of label noises. FedClean neither assumes the existence of clean clients nor the specific noise distributions, showing the maximum versatility. Extensive experimental results show that FedClean effectively identifies and rectifies label noises even if all clients exhibit label noises, which outperforms the state-of-the-art noise-label learning methods for federated learning.

Lay Summary: When we train machine learning models using data from many different users — such as phones, hospitals, or sensors — we often rely on federated learning, which keeps data private by training the model directly on each device. However, these local datasets may contain mistakes, like mislabeled images or incorrect diagnoses, which can seriously hurt the model’s performance. Our research tackles this problem by introducing FedClean, a new method that automatically detects and fixes these labeling errors. First, FedClean looks at each user’s data to identify reliable examples, then builds a stronger shared model using only those. Next, it corrects the remaining mistakes using a two-step process: one based on local insights and the other using the shared global model’s predictions. Finally, it combines all models in a smarter way that reduces the influence of noisy data. FedClean works even when all users have some incorrect data and doesn't assume we know where the errors come from. This makes it a practical solution for real-world federated learning systems in healthcare, education, and mobile apps.

Primary Area: General Machine Learning

Keywords: federated learning, noisy labels, model aggregation

Flagged For Ethics Review: true

Submission Number: 6398

Loading