Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients

TMLR Paper5062 Authors

09 Jun 2025 (modified: 25 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large-scale datasets invariably contain annotation noise. Re-labeling methods have been developed to handle annotation noise in large-scale datasets. Though various methodologies to alleviate annotation noise have been developed, these are particularly time-consuming and computationally intensive. The requirement of high computational power and longer time duration can be drastically reduced by selecting a representative coreset. In this work, we adapt a noise-free gradient-based coreset selection method towards re-labeling applications for noisy datasets with erroneous labels. We introduce ‘confidence score’ to the coreset selection method to cater for the presence of noisy labels. Through extensive evaluation over CIFAR-100N, Web Vision, and ImageNet-1K Datasets, we demonstrate that our method outperforms the SOTA coreset selection for re-labeling methods (DivideMix and SOP+). We have provided the anonymized codebase at URL.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Anurag_Arnab1
Submission Number: 5062
Loading