Abstract: Currently, in the field of anomaly detection, most existing methods rely on small models that focus on specific industrial scenarios which exhibit strong task orientation but lack generalization capabilities. Segment anything model (SAM), a vision foundation model designed for semantic segmentation tasks, has demonstrated remarkable performance in the natural scene segmentation. However, SAM lacks domain-specific knowledge of industrial defects and relies on an interactive inference framework, which restricts its application in industrial anomaly detection. In this article, we propose a novel framework: A large–small model collaboration framework for unsupervised industrial anomaly detection (LScAD), aiming to use task-oriented small models to guide SAM for precise anomaly localization. Specifically, the small model generates the initial guidance information, which serves as the input for a multimodal prompt module. This module consists of two modalities: image and text prompt. These prompts are then used to guide SAM for accurate anomaly segmentation. Moreover, we design a dual-branch adapter to enhance SAM’s domain-specific capability through a color-domain branch and a frequency-domain branch, aiming to improve its performance in anomaly detection tasks. Extensive experiments on the MVTec AD benchmark and other real-world industrial datasets demonstrate that our method achieves state-of-the-art performance, with an image-level area under the receiver operating characteristic curve (AUROC) of 99.6%, pixel-level AUROC of 98.4%, and average precision (AP) of 74.1% on the MVTec AD dataset. Our proposed method can also effortlessly adapt to multiclass anomaly detection without any modifications and achieve remarkable performance. Our code is available at: https://github.com/qsc1103/LScAD
External IDs:dblp:journals/tim/QuTGQPSZD25
Loading