Keywords: Adversarial Robustness, Adversarial Perturbations, Security, Safety, Speech Enhancement, Speech Denoising, Noise Suppression, Deep Noise Suppression, Psychoacoustic Masking
TL;DR: Deep speech denoising models are unexpectedly vulnerable to imperceptible adversarial noise and can be induced to output unintelligible gibberish.
Abstract: Deep noise suppression (DNS) models enjoy widespread use throughout a variety of high-stakes speech applications.
However, we show that four recent DNS models can each be reduced to outputting unintelligible gibberish through the addition of psychoacoustically hidden adversarial noise, even in low-background-noise and simulated over-the-air settings. For three of the models, a small transcription study with audio and multimedia experts confirms unintelligibility of the attacked audio; simultaneously, an ABX study shows that the adversarial noise is generally imperceptible, with some variance between participants and samples.
While we also establish several negative results around targeted attacks and model transfer, our results nevertheless highlight the need for practical countermeasures before open-source DNS systems can be used in safety-critical applications.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 22135
Loading