Unbiased Sample Selection and Label Improvement for Mitigating Noisy Labels in Class-Imbalanced Datasets

Yuan Wang, Yakun Chang, Ying Qin, Yao Zhao, Shikui Wei

Published: 2025, Last Modified: 22 Jan 2026IEEE Trans. Circuits Syst. Video Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Real-world datasets often suffer from both noisy labels and imbalanced class distribution, presenting significant challenges for the effective deployment of deep neural networks (DNNs). Existing studies typically address these challenges separately and struggle to perform effectively when they occur simultaneously. In this paper, we introduce an unbiased Sample Selection method based on the Graph Attention Network (GAT), namely GSS. GSS can effectively divide the training set into clean and noisy subsets while avoiding sample selection bias by analyzing the intrinsic relationships between the training set and a small clean validation set. For the clean subset, we propose an Adaptive Label Refinement (ALR) strategy to improve the reliability of the labels within the clean subset. ALR dynamically integrates the network’s predictions with the given labels, mitigating the adverse impacts of misidentification. For the noisy subset, we introduce a Class-Balanced Pseudo Labeling (CBPL) method. CBPL addresses the cognitive bias in model predictions caused by class imbalance by integrating class distribution information into the pseudo-label generation process, resulting in more accurate pseudo-labels. Comprehensive evaluations on both synthetic and real-world datasets highlight the effectiveness and superiority of our approach, especially in scenarios characterized by noisy labels and imbalanced class distributions.

External IDs:dblp:journals/tcsv/WangCQZW25