POUTA - Produce once, utilize twice for anomaly detection

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: anomaly detection, anomaly escape, overkill, reconstruction-based
TL;DR: A new approach for reconstruction-based anomaly detection is proposed, which reuses the features in the reconstructive network instead of the images, making it possible to locate the anomaly more accurately with lower costs than the vanilla method.
Abstract: Visual anomaly detection aims at classifying and locating the regions that deviate from the normal appearance. One of the solutions is the reconstruction-based approach, which locates the anomaly by analyzing the difference between the original and reconstructed images. However, when the reconstructed image is of low-quality or the anomaly is fine-grained, the image-level difference analysis approach usually fails. To deal with the above two cases, it is necessary to learn more accurate information. According to our observation, the features of the reconstructive network contains more accurate information about the anomaly than the image-level difference. To leverage the feature-level information, POUTA is proposed. In POUTA, the discriminative network takes the encoder and decoder features of the reconstructive network as the features of the original and reconstructed image respectively. And there is a coarse-to-fine process in each discriminative layer, the above information is refined by the high-level semantics and semantic supervision loss. The discriminative network accepts features as input now, so the feature extraction process (discriminative encoder) is unnecessary. In other words, POUTA produces the features in reconstructive network once but utilizes them twice for reconstruction and discrimination separately, which reduces the parameters and improves the efficiency. The experiments show that, compared with the vanilla method, POUTA achieves better performance with even fewer parameters and less inference time. On MVTec AD, VisA and DAGM dataset, POUTA also outperforms the state-of-the-art reconstruction-based methods.
Supplementary Material: pdf
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2554
Loading