Track: Web mining and content analysis
Keywords: Anomaly Detection, Representation Learning, Contrastive Learning, Denoising
Abstract: Semi-supervised anomaly detection (AD) has garnered growing attention due to its ability to effectively combine limited labeled data with abundant unlabeled data. However, current methods of-ten impose artificial constraints on the proportion of unlabeled anomalies in the training set or overlook potential noise from these anomalies, thereby impeding the effective training of models for anomaly detection in real-world scenarios where several anomalies may be present in the unlabeled dataset. Additionally, existing methods often struggle to effectively exploit and model the complex relationships between data instances, which is critical for learning more discriminative features and accurate distance measures. Distance-based methods, in particular, typically rely on Euclidean distance metric, which lacks the flexibility to capture complex correlations across different data dimensions. To address above challenges, we propose CAD, a denoising-aware Contrastive distance learning framework for semi-supervised AD. It introduces a contrastive training objective to facilitate the learning of distinctive representations by contrasting the average distance between anomalies and unlabeled samples. To fully exploit the information from the unlabeled data meanwhile mitigate the effects of noise, we incorporate a two-stage anomaly denoising and expansion strategy to refine the dataset by identifying high-confidence samples from the unlabeled set. Furthermore, we employ a parameterized bilinear tensor distance layer to learn a customized distance metric, enabling the model to capture intricate relationships among data points. Extensive experiments on 10 real-world datasets demonstrate that CAD significantly outperforms existing semi-supervised AD models. Code available at https://github.com/CADrepo/CAD.
Submission Number: 1543
Loading