Abstract: Speaker recognition systems (SRSs) are commonly used for biometric identification. However, these systems are vulnerable to adversarial attacks. Several defenses have been proposed but they require high costs in terms of additional data and computational resources to ensure robustness. To address these issues, this paper proposes a low-cost input reconstruction defense method called adaptive F-ratio-based partial masking (AFPM), which utilizes a robust feature extraction process to guarantee high defensibility. The underlying distribution of non-robust features is explored and filtered out by partial masking (PM), which helps maintain a low defense construction cost. An F-ratio-based PM (FPM) defense strategy is proposed by integrating the F-ratio, which reflects the weight of each frequency band for distinguishing between speakers, to balance classification accuracy and defensiveness. AFPM, which introduces an adaptive threshold calculation algorithm to FPM, is proposed to achieve further improved defensiveness and flexibility. Comparative experimental results show that AFPM is low-cost, highly defensive and universal. The construction process of AFPM does not involve training and its implementation does not require the protected SRSs to be retrained, only fine-tuned. While maintaining the classification accuracy at 99.42%, the average defense capability of AFPM against five white-box adaptive attacks is 90.89%, which is 9.23% better than that of the low-cost input reconstruction defense method and 3.77% better than that of the high-cost Parallel WaveGAN (PWG) defense approach. Against grey- and black-box adaptive attacks, FAKEBOB and Kenansville, AFPM reaches maximum defense effects of 96.01% and 74.49%, respectively, surpassing PWG by 4.5% and 65.82%. Furthermore, AFPM is universal and capable of protecting various SRSs against different attack strengths.
Loading