ImAD: An End-to-End Method for Unsupervised Anomaly Detection in the Presence of Missing Values

Feng Xiao; Jicong Fan

ImAD: An End-to-End Method for Unsupervised Anomaly Detection in the Presence of Missing Values

Feng Xiao, Jicong Fan

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Anomaly Detection, Missing Values

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Common anomaly detection methods require fully observed data for model training and inference and cannot handle data containing missing values. The missing data problem is pervasive in various real-world scenarios but the study of anomaly detection with missing data is quite limited. In this work, we first construct and evaluate a straightforward strategy, "impute-then-detect", which combines state-of-the-art data imputation methods with unsupervised anomaly detection methods, where the training data are only composed of normal samples. We observe that such two-stage methods often yield imputation bias for normal data, namely, the imputation methods are inclined to make incomplete samples "normal". The fundamental reason is that the imputation models are learned from normal data and cannot be generalized to abnormal data. To solve the challenging problem, we propose an end-to-end method called ImAD for unsupervised anomaly detection in the presence of missing values. ImAD integrates data imputation with anomaly detection into a unified optimization problem and introduces well-designed pseudo-abnormal samples to ensure the discrimination ability of the imputation process. Experiments in the settings of three different missing mechanisms, including MCAR, MAR, and MNAR, show that the proposed ImAD alleviates the imputation bias and achieves much better detection performance on balanced and skewed data, in comparison to the baselines.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2629

Loading