Keywords: anomaly detection, outlier detection, tabular data, benchmark
TL;DR: The most comprehensive anomaly detection benchmark including 30 algorithms and 57 datasets.
Abstract: Given a long list of anomaly detection algorithms developed in the last few decades, how do they perform with regard to (i) varying levels of supervision, (ii) different types of anomalies, and (iii) noisy and corrupted data? In this work, we answer these key questions by conducting (to our best knowledge) the most comprehensive anomaly detection benchmark with 30 algorithms on 57 benchmark datasets, named ADBench. Our extensive experiments (98,436 in total) identify meaningful insights into the role of supervision and anomaly types, and unlock future directions for researchers in algorithm selection and design. With ADBench, researchers can easily conduct comprehensive and fair evaluations for newly proposed methods on the datasets (including our contributed ones from natural language and computer vision domains) against the existing baselines. To foster accessibility and reproducibility, we fully open-source ADBench and the corresponding results.
Supplementary Material: pdf
Contribution Process Agreement: Yes
In Person Attendance: Yes
Dataset Url: To facilitate the reproducibility and fast experimental pipeline for the anomaly detection benchmark, we have made all the benchmark datasets and algorithms publicly available with BSD-2 License at https://github.com/Minqi824/ADBench, and welcome any customized algorithms to be evaluated via the plug-and-play testbed of ADBench.
Author Statement: Yes
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/arxiv:2206.09426/code)