Splitting & Integrating: Out-of-Distribution Detection via Adversarial Gradient Attribution

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Out-of-distribution (OOD) detection is essential for enhancing the robustness and security of deep learning models in unknown and dynamic data environments. Gradient-based OOD detection methods, such as GAIA, analyse the explanation pattern representations of in-distribution (ID) and OOD samples by examining the sensitivity of model outputs w.r.t. model inputs, resulting in superior performance compared to traditional OOD detection methods. However, we argue that the non-zero gradient behaviors of OOD samples do not exhibit significant distinguishability, especially when ID samples are perturbed by random perturbations in high-dimensional spaces, which negatively impacts the accuracy of OOD detection. In this paper, we propose a novel OOD detection method called \textbf{S \& I} based on layer \textbf{S}plitting and gradient \textbf{I}ntegration via Adversarial Gradient Attribution. Specifically, our approach involves splitting the model's intermediate layers and iteratively updating adversarial examples layer-by-layer. We then integrate the attribution gradients from each intermediate layer along the attribution path from adversarial examples to the actual input, yielding true explanation pattern representations for both ID and OOD samples. Experiments demonstrate that our S \& I algorithm achieves state-of-the-art results, with the average FPR95 of 29.05\% (ResNet34)/38.61\% (WRN40) and 37.31\% (BiT-S) on the CIFAR100 and ImageNet benchmarks, respectively. Our code is available at: https://github.com/LMBTough/S-I}{https://github.com/LMBTough/S-I
Lay Summary: Modern AI systems often struggle when they are asked to handle unfamiliar data — for example, a facial recognition model trained on studio portraits might fail on blurry street photos. Detecting when an input is “out-of-distribution” (OOD), meaning it is different from what the model was trained on, is a key challenge in making AI safer and more reliable. Some recent methods try to solve this by looking at how sensitive the model is to small changes in the input, which reveals how the model “understands” each example. But in high-dimensional data like images, these sensitivity patterns can become noisy or misleading — especially when normal examples are slightly altered. We propose a new technique, called S & I, that improves this detection by breaking the model into parts and carefully tracking how it reacts to each input, step-by-step. This gives a more accurate picture of how the model responds to known versus unknown data. Our method beats existing approaches in identifying unfamiliar data across several image datasets, helping make AI systems more trustworthy in the real world.
Link To Code: https://github.com/LMBTough/S-I
Primary Area: Social Aspects
Keywords: Out-of-Distribution Detection, Attribution
Submission Number: 8247
Loading