SML: A Backdoor Defense for Non-Intrusive Speech Quality Assessment via Semi-Supervised and Multi-Task Learning

Ying Ren, Wenjie Zhang, Jiahong Ye, Jie Li, Diqun Yan, Bin Ma

Published: 01 Jan 2025, Last Modified: 22 Jul 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Non-intrusive speech quality assessment (NISQA) is widely used in speech downstream tasks due to its ability to predict the quality of speech without a reference speech. However, few researchers have focused on the backdoor security of NISQA. Despite the backdoor defenses have been extensively studied to mitigate the threat of maliciously modifications in deep neural networks. In particular, semi-supervised based backdoor defenses have excellent defensive performance by depriving backdoor attacks of their most essential need. But these defense methods rely on data-augmentation consistency and thus cannot be applied to NISQA. In this work, we propose a backdoor defense based on semi-supervised and multi-task learning (SML). Semi-supervised learning is based on the simple assumption that the same input should be as consistent as possible in two similar models. Multi-task learning further improves the prediction performance of mean opinion score (MOS) by learning the tasks of perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI) and speech distortion index (SDI). Extensive experiments involving five backdoor defenses against five backdoor attacks on two benchmark datasets demonstrate the superiority of our SML approach.