Reconstruction-based Anomaly Detection with Completely Random Forest

Yi-Xuan Xu, Ming Pang, Ji Feng, Kai Ming Ting, Yuan Jiang, Zhi-Hua Zhou

2021 (modified: 17 Apr 2023)SDM 2021Readers: Everyone

Abstract: Reconstruction-based anomaly detectors have drawn much attention recently. Existing methods rely almost universally on the neural network autoencoder and its variants. Their performance is limited by the facts that the neural network autoencoder requires a large training set in order to achieve high accuracy and has high computational cost. In addition, its performance depends heavily on tuning a large number of hyper-parameters. Our work is motivated by recent studies showing that a forest model is also capable of capturing information about the dataset, comparable to that of a neural network autoencoder. We propose a novel reconstruction-based anomaly detector solely based on a completely random forest. The proposed method, RecForest, has three advantages over existing methods. First, the forest model has much higher training efficiency and significantly fewer hyper-parameters, addressing the two above-mentioned issues of neural network autoencoders. Second, RecForest has two new capabilities compared with existing forest-based anomaly detectors, i.e., RecForest can mine outlying attributes and handle irrelevant attributes in high-dimensional datasets. Third, in terms of mining outlying attributes, RecForest runs orders of magnitude faster than state-of-the-art outlying aspect miners on large datasets. We verify the effectiveness and efficiency of the proposed method through extensive experiments.

0 Replies