Supervised score aggregation for active anomaly detection

Published: 13 Jan 2026, Last Modified: 13 Jan 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Detecting rare anomalies in batches of multidimensional data is challenging. We propose an original supervised active-learning framework that sends a small number of data points from each batch to an expert for labeling as `anomaly' or `nominal' via two mechanisms: (i) points most likely to be anomalies in the eyes of a supervised classifier trained on previously-labeled data; and (ii) points suggested by an active learner. Instead of training the supervised classifier directly on currently-labeled raw data, we treat the scores calculated by an ensemble of $M$ user-defined unsupervised anomaly detectors as if they were the learner's input features. Our approach generalizes earlier attempts to linearly aggregate unsupervised anomaly detector scores, and broadens the scope of these methods from unordered bags of data to ordered data such as time series. Simulated and real data trials suggest that this method usually outperforms---often significantly---linear strategies. The Python library acanag implements our proposed method.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Camera Ready Version. Added link to webpage of the Python library. Removed the Python library from Supplementary Materials. Note to AE: we have been in touch with the overall Editors and they will add in the fourth author's name to the system after submission of the Camera Ready Version.
Code: https://github.com/yagu0/ActiveAnomalyAggregation/tree/main
Supplementary Material: zip
Assigned Action Editor: ~Philip_K._Chan1
Submission Number: 5788
Loading