Abstract: Zero-shot anomaly detection (ZSAD) is dedicated to detecting anomalies without having any seen normal or abnormal samples for the target set. Existing approaches utilize the pre-trained CLIP to assess normality/abnormality by exploiting the similarity between images and text with the frozen visual encoder. However, the frozen CLIP visual encoder impedes performance improvements. Additionally, their representations of anomalies are sensitive to contextual variations, leading to poor localization of unseen abnormalities. Therefore, this paper introduces the Dual Consistency Learning for Zero-Shot Anomaly Detection (C2AD), comprising two components: semantic and contextual consistency. Semantic consistency enhances generalization by maintaining correlational semantic consistency, while contextual consistency encourages representations to be robust to contextual changes. C2AD improves the model training without adding extra computational overhead during inference. Comprehensive experiments demonstrate that C2AD can boost the performance of ZSAD in anomaly detection and localization, achieving state-of-the-art results.
External IDs:dblp:conf/icassp/WangXTLYWS25
Loading