Quantifying Statistical Significance of Deep Nearest Neighbor Anomaly Detection via Selective Inference

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Anomaly Detection, k-Nearest Neighbors, Statistical Test, Selective Inference, Deep Learning
TL;DR: We propose a method to quantify the statistical significance of anomalies detected by deep k-nearest neighbor approach.
Abstract: In real-world applications, anomaly detection (AD) often operates without access to anomalous data, necessitating semi-supervised methods that rely solely on normal data. Among these methods, deep $k$-nearest neighbor (deep $k$NN) AD stands out for its interpretability and flexibility, leveraging distance-based scoring in deep latent spaces. Despite its strong performance, deep $k$NN lacks a mechanism to quantify uncertainty—an essential feature for critical applications such as industrial inspection. To address this limitation, we propose a statistical framework that quantifies the significance of detected anomalies in the form of $p$-values, thereby enabling control over false positive rates at a user-specified significance level (e.g.,0.05). A central challenge lies in managing selection bias, which we tackle using Selective Inference—a principled method for conducting inference conditioned on data-driven selections. We evaluate our method on diverse datasets and demonstrate that it provides reliable AD well-suited for industrial use cases.
Supplementary Material: zip
Primary Area: Probabilistic methods (e.g., variational inference, causal inference, Gaussian processes)
Submission Number: 25480
Loading