Privacy-Preserving Detection and Defense of Adversarial Examples

Dongdong Zhao, Guangyue Guo, Xiuwen Lu, Changtian Song

Published: 01 Jan 2025, Last Modified: 25 Jul 2025CSCWD 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Nowadays, deep learning techniques play a crucial role in fields such as computer vision and natural language processing. However, the security and privacy of deep learning models are still threatened. One common type of attack is adversarial attacks, where attackers construct adversarial examples that appear indistinguishable from normal samples to human eyes and induce deep learning models to make misjudgments, potentially leading to catastrophic consequences in deployed systems. Additionally, deep learning models rely heavily on large amounts of training data, once the user information is exposed, the user privacy and security will be compromised. The challenge lies in training deep learning models that are robust against adversarial attacks while protecting user privacy. Most of the existing detection and defense methods are only effective for specific types of attacks and are too dependent on the specific principles of adversarial attacks. Particularly, they do not consider privacy-preserving. In this paper, a privacy-preserving method for adversarial sample detection and defense is proposed, which utilizes denoisers and detectors to defend against adversarial samples while employing differential privacy mechanisms to protect user privacy. Experimental results demonstrate that this method can provide differential privacy protection, adversarial sample detection and defense without significantly affecting the model's accuracy. It effectively defends against common adversarial attack methods such as the Fast Gradient Sign Method (FGSM), Iterative and Projected Gradient Descent (PGD). Furthermore, our method achieves a trade-off between the model's privacy, adversarial robustness, and accuracy.