ViT Fine-tuning Vulnerability to Label Noise

Maria Marrium; Muhammad Haris Khan; Wenxiong Kang; Arif Mahmood

ViT Fine-tuning Vulnerability to Label Noise

Maria Marrium, Muhammad Haris Khan, Wenxiong Kang, Arif Mahmood

17 Sept 2025 (modified: 23 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Noisy label learning, benchmarking noisy label learning for ViT

TL;DR: A systematic study to asses the vulnerability of ViT fine-tuning to noisy labels.

Abstract: Automatic annotation of large-scale datasets often introduces noisy labels, which degrade the performance of deep neural networks. Noisy Label Learning (NLL) has been well-studied for Convolutional Neural Networks (CNNs) however, its effectiveness for Vision Transformers (ViTs) remains less explored. In this work, namely \texttt{NLL-ViT}, we comprehensively benchmark the robustness of ViTs under diverse label noise settings, and also recommend an entropy based regularization to improve ViT performance. To this end, we address many key research questions including vulnerability of ViT to noisy labels, robustness of ViT relative to CNNs, effectiveness of existing NLL methods for ViT, correlation of prediction entropy reduction and ViT robustness, and the impact of recommended entropy regularization on robustness. In this study, we conducted more than 850 experiments, evaluating ViT-B/16 and ViT-L/16 fine-tuned via MLP-K on two standard loss functions and ten state-of-the-art NLL methods. Our benchmark spans three noise types including closed-set, open-set, and real-world, across eight datasets: CIFAR-10, CIFAR-100, CIFAR-10N, CIFAR-100N, CIFAR80-O, WebVision, Clothing1M, and Food-101N. Our findings show that ViT fine-tuning is vulnerable to noisy labels, ViTs fine-tuning is more robust to label noise compared to CNNs, existing CNN-based NLL methods are only effective in closed-set settings while failing to outperform standard losses under open-set and real-world noise settings. We also observe a strong correlation between prediction entropy reduction and ViT robustness. Also, the recommended entropy regularization combined with standard classification losses significantly enhances ViTs' robustness to noisy labels. We will release the \texttt{NLL-ViT} code publicly upon acceptance.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 8445

Loading