Abstract: Large-scale datasets invariably contain annotation noise. Re-labeling methods have been developed to handle annotation noise in large-scale datasets. Though various methodologies to alleviate annotation noise have been developed, these are particularly time-consuming and computationally intensive. The requirement of high computational power and longer time duration can be drastically reduced by selecting a representative coreset. In this work, we adapt a noise-free gradient-based coreset selection method towards re-labeling applications for noisy datasets with erroneous labels. We introduce ‘confidence score’ to the coreset selection method to cater for the presence of noisy labels. Through extensive evaluation over CIFAR-100N, Web Vision, and ImageNet-1K Datasets, we demonstrate that our method outperforms the SOTA coreset selection for re-labeling methods (DivideMix and SOP+). We have provided the codebase at URL.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. Added the camera-ready version.
2. Replaced the anonymous GitHub URL with a public GitHub URL.
3. Added comparison with Prune4Rel in section 3.2.
4. Added results for ViT architecture with ImageNet-1K dataset in table 8.
5. Added "Comparative analysis with Vision Language Models" in section 4.9.
6. Added a discussion on "variation of softmax" in section 4.10.3.
Video: https://drive.google.com/file/d/1Ph2pb8f8TIvjbj2DbPtDGihXRVFvu9zc/view?usp=sharing
Code: https://github.com/ai23resch04001/GradRelabelling
Assigned Action Editor: ~Anurag_Arnab1
Submission Number: 5062
Loading