Sample-specific Noise Injection for Diffusion-based Adversarial Purification

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: *Diffusion-based purification* (DBP) methods aim to remove adversarial noise from the input sample by first injecting Gaussian noise through a forward diffusion process, and then recovering the clean example through a reverse generative process. In the above process, how much Gaussian noise is injected to the input sample is key to the success of DBP methods, which is controlled by a constant noise level $t*$ for all samples in existing methods. In this paper, we discover that an optimal $t*$ for each sample indeed could be different. Intuitively, the cleaner a sample is, the less the noise it should be injected, and vice versa. Motivated by this finding, we propose a new framework, called Sample-specific Score-aware Noise Injection (SSNI). Specifically, SSNI uses a pre-trained score network to estimate how much a data point deviates from the clean data distribution (i.e., score norms). Then, based on the magnitude of score norms, SSNI applies a reweighting function to adaptively adjust $t*$ for each sample, achieving sample-specific noise injections. Empirically, incorporating our framework with existing DBP methods results in a notable improvement in both accuracy and robustness on CIFAR-10 and ImageNet-1K, highlighting the necessity to allocate *distinct noise levels to different samples* in DBP methods. Our code is available at: https://github.com/tmlr-group/SSNI.
Lay Summary: *Diffusion-based purification* (DBP) is a promising framework to defend against \emph{adversarial examples} (AEs). However, existing DBP methods apply the same noise level to all inputs, regardless of how close they are to the clean data distribution. This sample-shared strategy can lead to over-distortion of clean examples (CEs) or insufficient purification of AEs. In this work, we propose ***S***ample-specific ***S***core-aware ***N***oise ***I***njection (SSNI), a new framework that adapts the noise injection level to each input based on its estimated distance from the clean data manifold. We compute this distance using the score norm, derived from a pre-trained score network that estimates the gradient of the log-density function. Intuitively, cleaner samples have lower score norms and are injected with less noise, while AEs with higher score norms are purified more aggressively. SSNI is lightweight, general-purpose, and easily integrable into existing DBP pipelines. Experiments on CIFAR-10 and ImageNet-1K show that SSNI improves both clean and robust accuracy across multiple baselines, while maintaining computational efficiency. By tailoring purification to each input, SSNI strikes a better balance between robustness and utility, and generalizes well to unseen attacks.
Link To Code: https://github.com/tmlr-group/SSNI
Primary Area: Deep Learning->Robustness
Keywords: adversarial purification, adversarial robustness, diffusion-based adversarial purification, accuracy-robustness trade-of
Submission Number: 13990
Loading