Deep anomaly detection with partition contrastive learning for tabular data

Yizhou Li, Yijie Wang, Hongzuo Xu, Bin Li, Xiaohui Zhou

Published: 2025, Last Modified: 15 May 2025Data Min. Knowl. Discov. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Self-supervised anomaly detection (AD) methods define transformations and surrogate tasks to deeply learn data “normality”, presenting superior performance. Different from most existing work designed for images, this paper considers self-supervised AD for tabular data, which has two desiderata: (i) transformation operations need to generate diverse transformed samples for downstream contrastive learning; (ii) the learning of surrogate tasks is required to perceive the semantics of tabular data, i.e., local–global patterns including intra-instance regularities and inter-instance structures. Related studies devise applicable transformations, but their surrogate tasks often neglect the inter-instance structures, failing to describe comprehensive data “normality" accurately. To fill these gaps, this paper proposes a novel partition contrastive learning-based anomaly detection method. We first devise a new transformation. Transformed samples are created by representing sub-vectors generated from different partitions of the whole feature space, during which different feature couplings are embedded in the transformed samples to make them sufficiently diverse. To capture intra-instance regularities, our approach learns a representation space where the transformed samples repel each other while concurrently resembling their corresponding spatial centers. A constraint is posed on the frames of these spatial centers to preserve their similarities between the corresponding original instances, facilitating the learning of inter-instance structures. The synergy of these two learning objectives encourages the modeling of tabular data semantics, thereby comprehensively modeling data “normality”. Abnormal degrees of the testing data are obtained by evaluating whether they conform to these learned patterns shared in the majority of data. Extensive experiments show that our approach achieves significant improvements (13% AUC-ROC and 30% AUC-PR) over state-of-the-art AD methods.