How Low Can You Go? Identifying Prototypical In-Distribution Samples for Unsupervised Anomaly Detection

TMLR Paper3008 Authors

16 Jul 2024 (modified: 05 Apr 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Unsupervised anomaly detection (UAD) alleviates large labeling efforts by training exclusively on unlabeled in-distribution data, through which outliers (out-of-distribution data) are detected as anomalies. While generally the assumption prevails that larger training datasets improve UAD performance, we find that training UAD models with extremely few carefully selected samples can match—or even surpass—the performance of training on the entire dataset. To investigate this effect, we introduce an unsupervised method to identify a compact core-set of prototypical samples boosting UAD performance, when training with only a select few samples. Our analysis across seven diverse UAD benchmarks from computer vision, industrial defect detection, and medicine shows that with just 25 selected samples, we exceed full-dataset training performance in 25 out of 67 categories. Furthermore, we find that the selected core-set of prototypical samples generalizes well across models and datasets, providing important insights into their in-distribution nature. These samples exhibit clear, unobstructed, high-fidelity characteristics, which highlights the importance of data quality over quantity in UAD training. The code is available at \url{https://anonymous.4open.science/r/uad_prototypical_samples/}
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: In response to the feedback and comments provided by the reviewers, we have incorporated all suggested changes and revisions as outlined in the official review comments. These changes span multiple sections of the paper, including improvements in clarity, precision, mathematical rigour, and overall presentation.
Assigned Action Editor: ~Philip_K._Chan1
Submission Number: 3008
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview