Stealthy Backdoor Attack Towards Federated Automatic Speaker Verification

Longling Zhang, Lyqi Liu, Dan Meng, Jun Wang, Shengshan Hu

Published: 2024, Last Modified: 05 Mar 2025ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Automatic speech verification (ASV) authenticates individuals based on distinct vocal patterns, playing a pivotal role in many applications such as voice-based unlocking systems for devices. The ASV system comprises three stages: training, registration, and validation. The model refines using voice data in training, extracts vocal features in registration, and contrasts these with speech patterns in validation. Modern ASV models, primarily grounded in DNN architectures, require extensive data for training. Federated learning (FL) fosters model-sharing across multiple clients while ensuring data privacy. Due to its open architecture, FL is vulnerable to backdoor attacks. However, training a stealthy backdoor attack in FL presents challenges, including diminished attack generalization owing to data heterogeneity, and conspicuous triggers that render them easily detectable. In this paper, we propose a Federated Stealthy Backdoor Attack method ($FedSBA$). FedSBA aims to improve the attack model’s generalization, enhance its persistence, and elude anomaly detection under the heterogeneous data distribution. FedSBA constructs an attack model based on a personalized transformer and encompasses a stealthy trigger. Moreover, we also propose a defensive strategy that utilizes an adaptive weight aggregation scheme. The stealthiness and effectiveness of FedSBA are demonstrated by exhibiting superior performance in comparison to previous works.