Abstract: In this paper, we propose WavePurifier, an audio purification framework to defend against audio adversarial attacks. Audio adversarial attacks craft adversarial examples or perturbations to attack the automated speech recognition (ASR) models. Although existing defense mechanisms can detect such attacks and raise alarms, they fail to recover or maintain benign commands. Consequently, this leads to the denial of users' benign commands. Different than existing defenses, WavePurifier aims to purify adversarial examples, thereby rectifying the user's benign commands. We find that the forward diffusion process of the diffusion model effectively eliminates perturbations, whereas the reverse diffusion process restores benign speech. Based on this, we develop a hierarchical diffusion model to defend against audio adversarial examples. This model is capable of purifying different spectrogram bands to varying degrees. To validate the performance of WavePurifier, we purify the adversarial examples from 3 different adversarial attacks in 140 distinct settings. In total, we collect 78,864 diffused spectrograms and 21,000 purified audios. Then, we evaluate WavePurifier on 2 different ASR models, 4 commercial speech-to-text APIs, 2 real-world attack scenarios, and compare them against 7 existing defense approaches. Our result shows that WavePurifier is a universal framework, demonstrating adaptability across diverse attacks with the same hyperparameters. Notably, WavePurifier outperforms existing methods with the lowest character error rate (CER), word error rate (WER), and a high purification success rate against different attacks.
External IDs:doi:10.1145/3636534.3690692
Loading