Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Provable and Practical Framework with Synthetic Anomalies

ICLR 2026 Conference Submission19369 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Statistical Learning Theory; Anomaly Detection; Supervised Learning; Classification; Synthetic Data
TL;DR: We generalize unsupervised anomaly detection to the semi-supervised setting by showing that synthetic anomalies — previously used in unsupervised AD — remain provably and empirically beneficial with limited labeled anomalies.
Abstract: Anomaly detection (AD) is a critical task across domains such as cybersecurity and healthcare. In the unsupervised setting, an effective and theoretically-grounded principle is to train classifiers to distinguish normal data from (synthetic) anomalies. We extend this principle to semi-supervised AD, where training data also include a limited labeled subset of anomalies possibly present in test time. We propose a theoretically-grounded and empirically effective framework for semi-supervised AD that combines known and synthetic anomalies during training. To analyze semi-supervised AD, we introduce the first mathematical formulation of semi-supervised AD, which generalizes unsupervised AD. Here, we show that synthetic anomalies enable (i) better anomaly modeling in low-density regions and (ii) optimal convergence guarantees for neural network classifiers — the first theoretical result for semi-supervised AD. We empirically validate our framework on five diverse benchmarks, observing consistent performance gains. These improvements also extend beyond our theoretical framework to other classification-based AD methods, validating the generalizability of the synthetic anomaly principle in AD.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 19369
Loading