Coreset-Driven Re-Labeling: Tackling Noisy Annotations with Noise-Free Gradients

Published: 02 Oct 2025, Last Modified: 02 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large-scale datasets invariably contain annotation noise. Re-labeling methods have been developed to handle annotation noise in large-scale datasets. Though various methodologies to alleviate annotation noise have been developed, these are particularly time-consuming and computationally intensive. The requirement of high computational power and longer time duration can be drastically reduced by selecting a representative coreset. In this work, we adapt a noise-free gradient-based coreset selection method towards re-labeling applications for noisy datasets with erroneous labels. We introduce ‘confidence score’ to the coreset selection method to cater for the presence of noisy labels. Through extensive evaluation over CIFAR-100N, Web Vision, and ImageNet-1K Datasets, we demonstrate that our method outperforms the SOTA coreset selection for re-labeling methods (DivideMix and SOP+). We have provided the codebase at URL.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. Added the camera-ready version. 2. Replaced the anonymous GitHub URL with a public GitHub URL. 3. Added comparison with Prune4Rel in section 3.2. 4. Added results for ViT architecture with ImageNet-1K dataset in table 8. 5. Added "Comparative analysis with Vision Language Models" in section 4.9. 6. Added a discussion on "variation of softmax" in section 4.10.3.
Video: https://drive.google.com/file/d/1Ph2pb8f8TIvjbj2DbPtDGihXRVFvu9zc/view?usp=sharing
Code: https://github.com/ai23resch04001/GradRelabelling
Assigned Action Editor: ~Anurag_Arnab1
Submission Number: 5062
Loading