Abstract: With the widespread use of deep learning systems in many applications, adversaries have strong incentives to perform attacks against these systems for their adversarial purposes. Reports have indicated that backdoor attacks on deep neural networks represent a novel form of threat. In this attack, the adversary will inject backdoors into the benign model and then mislead the model to classify the input containing backdoor triggers as a target label specified by the adversary. Existing research mainly focuses on backdoor attacks in image and text models, little attention has been paid to the backdoor attacks on text-to-speech (TTS) models. We conduct a systematic investigation of backdoor attacks on text-to-speech models and propose BadTTS, the first backdoor attack against TTS models, which is a general backdoor attack framework that tampers with input texts in three semantic levels to generate malicious output speech, including Char-Backdoor, Word-Backdoor, and Sentence-Backdoor. Our method not only efficiently injects backdoors into a TTS model but is also stealthy and has little impact on the synthesized speech quality. We implement the backdoor attack in a black-box fine-tuning setting, where the adversary has no knowledge of model architectures except for a small amount of training data. We perform empirical experiments on three representative and widely studied TTS models, indicating that backdoors can be injected into TTS models within a few fine-tuning steps. Additionally, we conduct experiments to explore the impact of different types of triggers, as well as the intermediate outputs of models, which provide insights for potential defenses against backdoor attacks.
Loading