Dual-Modeling Decouple Distillation for Unsupervised Anomaly Detection

Published: 20 Jul 2024, Last Modified: 06 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Knowledge distillation based on student-teacher network is one of the mainstream solution paradigms for the challenging unsupervised Anomaly Detection task, utilizing the difference in representation capabilities of the teacher and student networks to implement anomaly localization. However, over-generalization of the student network to the teacher network may lead to negligible differences in representation capabilities of anomaly, thus affecting the detection effectiveness. Existing methods address the possible over-generalization by using differentiated students and teachers from the structural perspective or explicitly expanding distilled information from the content perspective, which inevitably result in an increased likelihood of underfitting of the student network and poor anomaly detection capabilities in anomaly center or edge. In this paper, we propose Dual-Modeling Decouple Distillation (DMDD) for the unsupervised anomaly detection. In DMDD, a Decouple Student-Teacher Network is proposed to decouple the initial student features into normality and abnormality features. We further introduce Dual-Modeling Distillation based on normal-anomaly image pairs, fitting normality features of anomalous image and the teacher features of the corresponding normal image, widening the distance between abnormality features and the teacher features in anomalous regions. Synthesizing these two distillation ideas, we achieve anomaly detection which focuses on both edge and center of anomaly. Finally, a Multi-perception Segmentation Network is proposed to achieve focused anomaly map fusion based on multiple attention. Experimental results on MVTec AD show that DMDD surpasses SOTA localization performance of previous knowledge distillation-based methods, reaching 98.85% on pixel-level AUC and 96.13% on PRO.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: 1. Support for multi-modal methods: Current multi-modal anomaly detection combines RGB images and point clouds to achieve 3D anomaly detection. Therefore, our work can implement the means of processing RGB images in multi-modal anomaly detection solutions and support the implementation of multi-modal anomaly detection. 2. Generalised approach for multiple modalities: Keeping the implementation idea, our work can be applied to data of many different modalities, such as infrared image, point cloud, time serie, video, and so on, by simply replacing the backbone of the network and the specific used loss functions.
Submission Number: 5452
Loading