FOCT: Few-shot Industrial Anomaly Detection with Foreground-aware Online Conditional Transport

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Few-Shot Industrial Anomaly Detection (FS-IAD) has drawn great attention most recently since data efficiency and the ability of designing algorithms for fast migration across products become the main concerns. The difficulty of memory-based IAD in low-data regimes primarily lies in inefficient measurement between the memory bank and query images. We address such pivot issues from a new perspective of optimal matching between features of image regions. Taking the unbalanced nature of patch-wise industrial image features into consideration, we adopt Conditional Transport (CT) as a metric to compute the structural distance between representations of the memory bank and query images to determine feature relevance. The CT generates the optimal matching flows between unbalanced structural elements that achieve the minimum matching cost, which can be directly used for IAD since it well reflects the differences of query images compared with the normal memory. Realizing the fact that query images usually come one-by-one or batch-by-batch, we further propose an Online Conditional Transport (OCT) by making full use of current and historical query images for IAD via simultaneously calibrating the memory bank and matching features between the calibrated memory and the current query features. Go one step further, for sparse foreground products, we employ a predominant segment model to implement Foreground-aware OCT (FOCT) to improve the effectiveness and efficiency of OCT by forcing the model to pay more attention to diverse targets rather than redundant background when calibrating the memory bank. FOCT can improve the diversity of calibrated memory during the whole IAD process, which is critical for robust FS-IAD in practice. Besides, FOCT is flexible since it can be friendly plugged and played with any pre-trained backbones, such as WRN, and any pre-trained segment models, such as SAM. The effectiveness of our model is demonstrated across diverse datasets, including benchmarks of MVTec and MPDD, achieving SOTA performance.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: This work contributes to multimedia/multimodal processing by enabling efficient, adaptable, and effective anomaly detection across diverse and complex datasets with minimal examples. This is particularly valuable in scenarios where data is scarce, varied, and changing, which are common challenges in multimedia processing.
Submission Number: 1833
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview