DiffRA: universal restorative adversarial attack based on diffusion model

Mingwen Shao, Wenjie Liu, Lingzhuang Meng, Huan Liu, Xiaodong Tan

Published: 2025, Last Modified: 04 Nov 2025Multim. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Image restoration is commonly applied in the field of artificial intelligence as a pre-processing technique to enhance the visual effects of images. However, we discovered that uncertainties may be introduced during the image restoration process, causing the restored image to mislead Deep Neural Networks (DNNs). Based on this insight, we propose a universal restorative attack scheme based on the diffusion model in this paper, called DiffRA, which cleverly incorporates imperceptible perturbations while restoring various degraded images, thereby achieving a more covert attack. Specifically, to embed adversarial perturbations while performing restoration, we propose a novel adversarial-guided strategy that utilizes the gradient information from a classifier as a condition to guide the sampling process of the restoration diffusion model, so that the restored image gradually deviates from its original class at the feature level. To further enhance the transferability of adversarial examples, we incorporate CLIP guidance as a supplement to make the generated images toward mismatches with the correct class prompts at the semantic level. With the above elaborate designs, our DiffRA naturally introduces adversarial perturbations into the restored image during the restoration process, thus generating adversarial examples that are very close to clean images. Extensive experiments demonstrate that DiffRA effectively restores high-quality images from degraded images and achieves a high attack success rate on DNNs. Code is released in https://github.com/dddoudj/DiffRA_main.

External IDs:dblp:journals/mms/ShaoLMLT25