Abstract: In recent years, speaker verification (SV) systems have become ubiquitous across security-critical applications. While these systems encode speaker identities into high-dimensional embeddings, they remain vulnerable to adversarial attacks that manipulate these embeddings, so it is essential for us to expose as many “blind spots” of speaker verification systems as possible. Existing attacks predominantly inject additive noise, which often compromises speech naturalness and lacks semantic control. In this paper, we propose the Timbre Adversarial attack (TimbreAdv), a novel paradigm that exploits vocal tract characteristics to deceive SV systems. Our framework introduces hierarchical feature disentanglement, feature-level timbre blending, and multi-object adversarial optimization to generate adversarial samples under the setting of black-box. We use comprehensive metrics to evaluate our method, and the results show great attack effectiveness and stealthiness.
External IDs:dblp:conf/icann/XiaoYLYCLXW25
Loading