Speed Master: Quick or Slow Play to Attack Speaker Recognition

Zhe Ye, Wenjie Zhang, Ying Ren, Xiangui Kang, Diqun Yan, Bin Ma, Shiqi Wang

Published: 26 Feb 2025, Last Modified: 12 Apr 2025AAAI 2025EveryoneRevisionsCC BY 4.0

Abstract: Backdoor attacks pose a significant threat during the model's training phase. Attackers craft pre-defined triggers to break deep neural networks, ensuring the model accurately classifies clean samples during inference yet erroneously classifies samples added with these triggers. Recent studies have shown that speaker recognition systems trained on large-scale data are susceptible to backdoor attacks. Existing attackers employ unnoticed ambient sounds as triggers. However, these sounds are not inherently part of the training samples themselves. In essence, triggers can be designed to maintain an intrinsic connection with the original speech to enhance stealthiness. Our paper presents a novel attack methodology named Speed Master, which undermines deep neural networks by manipulating the speed of speech samples. Specifically, we execute poison-only backdoor attacks using direct or tempo speed adjustment. In real-world scenarios, individuals have the autonomy to adjust their speaking rate, which can vary according to the context. Consequently, users typically perceive fluctuations in a speaker's speech rate as natural, making it unlikely to arouse suspicion. Furthermore, detecting such subtle adjustments becomes challenging for users without reference speech. Our comprehensive experiments demonstrate that Speed Master can achieve an ASR over 99% in the digital domain, with only a 0.6% poisoning rate. Additionally, we validate the feasibility of Speed Master in the real world and its resistance to typical defensive measures.