Abstract: Unsupervised Outlier Detection (UOD) is crucial for the analysis of biomedical and health data with undesirable outliers. However, the complex distribution of real data often brings difficulties to UOD where the "masking effect", i.e., only a small number of densely distributed outliers (also called clustliers) can collectively mask themselves from being detected, is particularly challenging. Another difficulty derived from this is how to distinguish clustliers from small clusters. Therefore, we propose a novel Multi-Granular Outlier Detector (MGOD). It first partitions the dataset into subsets with natural neighbor topological relationships to circumvent the non-trivial neighbor range setting. Then it effectively detects both clustliers and isolated samples (also called scatliers) based on a newly designed anomaly score. The score comprehensively takes into account the density and connectivity of samples to reflect different extents and types of abnormality. It turns out that MGOD is accurate and highly interpretable. The performance of MGOD is also robust to the involved hyper-parameters, which are easy to set. Comprehensive evaluations have been conducted to compare seven counterparts on 15 datasets, most of which are biomedical datasets. The results of significance tests confirm the effectiveness and superiority of MGOD. The source code is opened at https://anonymous.4open.science/r/MGOD-C531.
Loading