Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors

Guangyao Zhai; Yue Zhou; Xinyan Deng; Lars Heckler-Kram; Nassir Navab; Benjamin Busam

Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors

Guangyao Zhai, Yue Zhou, Xinyan Deng, Lars Heckler-Kram, Nassir Navab, Benjamin Busam

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Representation Learning, Few-Shot Anomaly Detection, Applications of Foundation Models

TL;DR: We introduce a few-shot anomaly detection method using foundation visual encoders and a nonlinear projection onto the natural image manifold. It detects structural anomalies efficiently, supports multi-class cases, and delivers strong performance.

Abstract: Few-shot anomaly detection streamlines and simplifies industrial safety inspection. However, limited samples make accurate differentiation between normal and abnormal features challenging, and even more so under category-agnostic conditions. Large-scale pre-training of foundation visual encoders has advanced many fields, as the enormous quantity of data helps to learn the general distribution of normal images. We observe that the anomaly amount in an image directly correlates with the difference in the learnt embeddings and utilize this to design a few-shot anomaly detector termed FoundAD. This is done by learning a nonlinear projection operator onto the natural image manifold. The simple operator acts as an effective tool for anomaly detection to characterize and identify out-of-distribution regions in an image. Extensive experiments show that our approach supports multi-class detection and achieves competitive performance compared to other approaches, while surpassing them in model size and inference efficiency. Backed up by evaluations with multiple foundation encoders, including fresh DINOv3, we believe this idea broadens the perspective on foundation features and advances the field of few-shot anomaly detection. Our code is at https://github.com/ymxlzgy/FoundAD.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 6144

Loading