PViT: Prior-Augmented Vision Transformer for Out-of-Distribution Detection

PViT: Prior-Augmented Vision Transformer for Out-of-Distribution Detection

TMLR Paper8628 Authors

26 Apr 2026 (modified: 19 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Vision Transformers (ViTs) have achieved remarkable success over various vision tasks, yet their robustness against data distribution shifts and inherent inductive biases remain underexplored. To enhance the robustness of ViT models for image Out-of-Distribution (OOD) detection, we introduce a novel and generic framework named Prior-augmented Vision Transformer (PViT). We train PViT to predict class labels while taking as input both image tokens and the prior class logits from a pretrained model. During inference, PViT identifies OOD samples by quantifying the divergence between the predicted class logits and the prior logits obtained from pre-trained models. Unlike existing state-of-the-art(SOTA) OOD detection methods, PViT shapes the decision boundary between ID and OOD by utilizing the proposed prior guided confidence, without requiring additional data modeling, generation methods, or structural modifications. Extensive experiments on the large-scale \textsc{ImageNet} benchmark, evaluated against over seven OOD datasets, demonstrate that PViT significantly outperforms existing SOTA OOD detection methods in terms of FPR95 and AUROC.

Submission Type: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Xiaofeng_Cao1

Submission Number: 8628

Loading