A Neural Probabilistic outlier detection method for categorical data

Li Cheng, Yijie Wang, Xingkong Ma

Published: 2019, Last Modified: 13 Nov 2024Neurocomputing 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Unsupervised outlier detection for categorical data is important and essential for broad applications in various domains. The complex interactions between attributes and the relevance of attributes make it a stem challenge. Existing methods, including patterns-based and couplings-based methods, either fail to capture the complex interactions or cannot handle the diverse attributes well. In this paper, we propose a novel Neural Probabilistic Outlier Detection method for categorical data, called NPOD. We present a new log-bilinear neural model to learn the categorical distributions, and we observe that the inliers and the outliers can be well separated according to learning loss. Based on this basic observation, we give both empirical and theoretical analysis and present a new neural network architecture that captures the interactions of attributes. Moreover, the discriminative information is used in the proposed bias training process to make the inliers and the outliers more separable. Lastly, to distinguish relevance of attributes, two indicators are proposed for computing ensemble outlier score to get a reliable result. Experimental results show that NPOD significantly outperforms the state-of-the-art competitors on 12 real-world data sets in terms of AUC and P@k.