Q-PGD: A Stealthy and Effective Speaker Recognition Adversarial Attacks Based on Quantized Projected Gradient Descent

Jiahui Wang, Yulong Fu, Jincheng Yang

Published: 2025, Last Modified: 09 Jan 2026IJCNN 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the widespread adoption of Neural Network Model in Speaker Recognition Systems (SRS), the vulnerability caused by Adversarial Attacks poses significant security concerns recently. However, adversarial attack research on SRS focuses mainly on the audio signal perturbation in the time domain, while studies on frequency-domain-based attacks still remain limited. In this work, inspired by the MP3 compression process, we proposed a frequency-domain quantization attack method based on Projected Gradient Descent (PGD), which leverages the (inverse) Modified Discrete Cosine Transform (MDCT) to convert audio signals from the time domain to the frequency domain, and iteratively optimized a trainable quantization table to compress and manipulate frequency-domain magnitudes. For both closed and open data sets used by the current mainstream speaker recognition models, the proposed method not only achieved a 100% attack success rate but also made the attack process more stealthy. Meanwhile, it can maintain a high attack success rate even against SRS equipped with transformation-based defense mechanisms. In particular, to our knowledge, the proposed work is the first successful attack to the Cam++ SRS model.

External IDs:dblp:conf/ijcnn/WangFY25a