CuckooAttack: Towards Practical Backdoor Attack against Automatic Speech Recognition Systems

Bowen Li, Yunjie Ge, Zheng Fang, Tao Wang, Lingchen Zhao, Quan Lu, Ning Jiang, Qian Wang

Published: 2025, Last Modified: 12 Jan 2026IEEE Trans. Dependable Secur. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep learning-based automatic speech recognition (ASR) systems are capable of transcribing input audio of arbitrary duration into character sequences, which are widely used in daily life. However, recent research has found that deep learning models are vulnerable to backdoor attacks. A malicious adversary can embed a backdoor functionality into the model during the training phase and manipulate the output of the backdoored model by adding a specific trigger to the input during the inference phase. Unfortunately, TrojanModel, the existing state-of-the-art backdoor attack against ASR systems (Zong et al. S&P’23), relies on an overly strong assumption that requires the adversary to modify the model structure beyond data poisoning, which significantly limits its practicability. In this paper, we propose CuckooAttack, a more practical backdoor attack against ASR systems that only requires poisoning a small portion of the training data. We first construct a phoneme-level auxiliary dataset to generate effective, robust, and unnoticeable triggers, while substantially lowering computational expenses. Considering the real-world ASR application scenarios, we propose an adaptive trigger injection mechanism to ensure that the backdoor can be activated on variable-duration input audio under asynchronous temporal conditions. To further enhance the efficacy of CuckooAttack, we design a character-filling strategy tailored for ASR to construct poisoned samples, which facilitates the model in establishing backdoor connections. Extensive experiments show that CuckooAttack achieves comparable performance with TrojanModel under a weaker assumption. Specifically, CuckooAttack achieves an attack success rate of about 99% in the digital domain and over 90% in the physical domain, with a poison rate of only 1%.

External IDs:dblp:journals/tdsc/LiGFWZLJW25