Abstract: Detecting rare anomalies in batches of multidimensional data is challenging.
We propose an original supervised active-learning framework that sends a small number of data points from each batch to an expert for labeling as `anomaly' or `nominal' via two mechanisms: (i) points most likely to be anomalies in the eyes of a supervised classifier trained on previously-labeled data; and (ii) points suggested by an active learner. Instead of training the supervised classifier directly on currently-labeled raw data, we treat the scores calculated by an ensemble of $M$ user-defined unsupervised anomaly detectors as if they were the learner's input features. Our approach generalizes earlier attempts to linearly aggregate unsupervised anomaly detector scores, and broadens the scope of these methods from unordered bags of data to ordered data such as time series. Simulated and real data trials suggest that this method usually outperforms---often significantly---linear strategies.
The Python library acanag implements our proposed method.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: General revision of the article and the appendix, and added code to the acanag package for reproducing new trials now contained in the paper.
Assigned Action Editor: ~Philip_K._Chan1
Submission Number: 5788
Loading