Abstract: Detecting rare anomalies in batches of multidimensional data is challenging.
We propose an original supervised active-learning framework that sends a small number of data points from each batch to an expert for labeling as `anomaly' or `nominal' via two mechanisms: (i) points most likely to be anomalies in the eyes of a supervised classifier trained on previously-labeled data; and (ii) points suggested by an active learner. Instead of training the supervised classifier directly on currently-labeled raw data, we treat the scores calculated by an ensemble of $M$ user-defined unsupervised anomaly detectors as if they were the learner's input features. Our approach generalizes earlier attempts to linearly aggregate unsupervised anomaly detector scores, and broadens the scope of these methods from unordered bags of data to ordered data such as time series. Simulated and real data trials suggest that this method usually outperforms---often significantly---linear strategies.
The Python library acanag implements our proposed method.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: This is the original submission. I would just like you to know however that there is a 4th author to the paper, who does not have an OpenReview account, and I am currently unable to contact him to ask him to create one as he appears to be absent due to family issues.
Assigned Action Editor: ~Philip_K._Chan1
Submission Number: 5788
Loading