HI-FAD and SpoofMix: A High-Frequency-Aware Framework and Realistic Benchmark for Robust Fake Audio Detection
Abstract: Recently, fake audio detection (FAD) has made great progress in response to sophisticated spoofing attacks.
However, existing frameworks still overlook two critical needs: (1) frequency-aware analysis of artifacts and (2) benchmark that simulate real-world spoofing attacks based on speech mixtures.
To deal with these gaps, we propose HI-FAD, a novel high-frequency–aware FAD framework, and SpoofMix, a challenging benchmark incorporating both real and spoofed speech within single audio samples.
In particular, HI-FAD employs a discrete wavelet transform (DWT) to extract high-frequency subbands and fuses them with front-end model representations via cross-attention.
% It can be seamlessly integrated into existing FAD models without any architectural modifications.
Experimental results demonstrate that HI-FAD consistently outperforms conventional methods on the ASVspoof2019 Logical Access (LA) and ASVspoof2021 LA. Moreover, the proposed framework achieves state-of-the-art detection on SpoofMix, demonstrating its robustness under realistic mixed-speech conditions. The source code and SpoofMix benchmark are available here :https://github.com/blind-review-user123/HI-FAD.git
Paper Type: Short
Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding
Research Area Keywords: Fake Audio Detection
Languages Studied: English
Submission Number: 7977
Loading