SEAD: Unsupervised Ensemble of Streaming Anomaly Detectors

Saumya Gaurang Shah; Abishek Sankararaman; Balakrishnan Murali Narayanaswamy; Vikramank Singh

SEAD: Unsupervised Ensemble of Streaming Anomaly Detectors

Saumya Gaurang Shah, Abishek Sankararaman, Balakrishnan Murali Narayanaswamy, Vikramank Singh

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

TL;DR: This paper introduces SEAD: The first technique for ensembling unsupervised anomaly detectors in the streaming setting, adapting to the distribution of scores generated by base anomaly detectors on each dataset.

Abstract: Can we efficiently choose the best Anomaly Detection (AD) algorithm for a data-stream without requiring anomaly labels? Streaming anomaly detection is hard. SOTA AD algorithms are sensitive to their hyperparameters and no single method works well on all datasets. The best algorithm/hyper-parameter combination for a given data-stream can change over time with data drift. 'What is an anomaly?' is often application, context and dataset dependent. We propose SEAD (Streaming Ensemble of Anomaly Detectors), the first model selection algorithm for streaming, unsupervised AD. All prior AD model selection algorithms are either supervised, or only work in the offline setting when all data from the test set is available upfront. We show that SEAD is {\em(i)} unsupervised, i.e., requires no true anomaly labels, {\em(ii)} efficiently implementable in a streaming setting, {\em (iii)} agnostic to the choice of the base algorithms among which it chooses from, and {\em (iv)} adaptive to non-stationarity in the data-stream. Experiments on 14 non-trivial public datasets and an internal dataset corroborate our claims.

Lay Summary: Real-time anomaly detection systems are important in applications ranging from cybersecurity to healthcare. These systems a) do not have access to labels, and b) need to adapt to changes in input data over time. Although many such methods exist for real-time anomaly detection, no single method works well for all applications. We ask the question: Is it possible to select the best model for the given application at every timestamp without using labels? We select the best models at each point in time by giving higher preference to models that have generated fewer detections in the past, using the intuition that anomalies are inherently rare. Our research presents the first model selection method for real-time anomaly detection without using labels.

Primary Area: General Machine Learning->Online Learning, Active Learning and Bandits

Keywords: anomaly detection, online learning, streaming, continual learning

Submission Number: 8202

Loading