Abstract: Semi-supervised anomaly detection (AD) has garnered growing attention due to its ability to effectively leverage limited labeled data to identify anomalies. However, current methods often impose artificial constraints on the proportion of unlabeled anomalies in the training set, thereby impeding the effective training of models for anomaly detection in real-world scenarios where several anomalies may be present in the unlabeled dataset. Additionally, existing methods often struggle to effectively exploit and model the complex relationships between data instances, which is critical for learning more discriminative features and accurate distance measures. Distance-based methods, in particular, typically rely on Euclidean distance metric, which lacks the flexibility to capture complex correlations across different data dimensions. To address the above challenges, we propose CAD, a denoising-aware <u>C</u>ontrastive distance learning framework for semi-supervised <u>AD</u>. It introduces a contrastive training objective to facilitate the learning of distinctive representations by contrasting the average distance between anomalies and unlabeled samples. To fully exploit the information from the unlabeled data meanwhile mitigate the effects of noise, we incorporate a two-stage anomaly denoising and expansion strategy to refine the dataset by identifying high-confidence samples from the unlabeled set. Furthermore, we employ a parameterized bilinear tensor distance layer to learn a customized distance metric, enabling the model to capture intricate relationships among data points. Extensive experiments on 10 real-world datasets demonstrate that CAD significantly outperforms existing semi-supervised AD models. Code available at https://github.com/CADrepo/CAD.
External IDs:dblp:conf/www/GaoTSJ025
Loading