Exploring plain ViT features for multi-class unsupervised visual anomaly detection

Jiangning Zhang, Xuhai Chen, Yabiao Wang, Chengjie Wang, Yong Liu, Xiangtai Li, Ming-Hsuan Yang, Dacheng Tao

Published: 2025, Last Modified: 09 Nov 2025Comput. Vis. Image Underst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Novel ViTAD Model: A novel ViTAD model inspired by Meta-AD, using ViT for multi-class unsupervised anomaly detection from global and local views.•SoTA Performance: ViTAD achieves SoTA on MVTec AD (85.4 mAD, +3.0), VisA, and Uni-Medical datasets, evaluated with eight metrics.•Efficiency and Robustness: ViTAD trains in 1.1 h with 2.3G GPU memory on a single V100, offering an efficient and robust baseline for future research.•In-Depth Analysis: Provides insights and ablations on factors affecting MUAD performance, deepening understanding of the task.