READ: Robust and Efficient Anomaly Detection under Data Contamination and Limited Supervision

Hongzhe shou, Guanyu Lu, Martin Pavlovski, Fang Zhou

Published: 02 Aug 2025, Last Modified: 28 Jul 2025KDD'2025EveryoneRevisionsCC BY-NC 4.0

Abstract: Existing anomaly detection methods tend to utilize a large amount of training data to learn patterns of normal data for effective anomaly identification, but such methods typically incur substantial training time overhead. Considering that unlabeled data often contains a lot of redundant information, selecting and utilizing a small yet representative subset instead of the entire dataset can significantly improve training efficiency while maintaining detection performance. To this end, we introduce an end-to-end reinforcement learning framework with a balanced sampling strategy that targets both normal and abnormal instances. This framework identifies and exploits potential anomalies in the unlabeled data while sampling peripheral normal instances (often difficult to detect), thereby enhancing the overall anomaly detection performance without requiring excessive time for the sampling process. Additionally, we present a joint reward mechanism, combined with inconsistency penalties, which optimizes both an agent’s action space and the representation space, ultimately improving the quality of the sampling process. Extensive experiments on four public datasets from different domains demonstrate the effectiveness and efficiency of our framework. The code is available at https://github.com/ZhouF- ECNU/READ.