Keywords: adversarial attacks, transferability, black-box, self-supervised learning, speech recognition
TL;DR: We show that recent Self-supervised ASR model are uniquely vulnerable to adversarial attacks requiring no model access
Abstract: A targeted adversarial attack produces audio samples that can force an Automatic Speech Recognition (ASR) system to output attacker-chosen text. To exploit ASR models in real-world, black-box settings, an adversary can leverage the \textit{transferability} property, i.e. that an adversarial sample produced for a proxy ASR can also fool a different remote ASR. Recent work has shown that transferability against large ASR models is extremely difficult. In this work, we show that modern ASR architectures, specifically ones based on Self-Supervised Learning, are uniquely affected by transferability. We successfully demonstrate this phenomenon by evaluating state-of-the-art self-supervised ASR models like Wav2Vec2, HuBERT, Data2Vec and WavLM. We show that with relatively low-level additive noise achieving a 30dB Signal-Noise Ratio, we can achieve target transferability with up to 80\% accuracy. We then use an ablation study to show that Self-Supervised learning is a major cause of that phenomenon. Our results present a dual interest: they show that modern ASR architectures are uniquely vulnerable to adversarial security threats, and they help understanding the specificities of SSL training paradigms.
Supplementary Material: zip
Submission Number: 28
Loading