HI-FAD and SpoofMix: A High-Frequency-Aware Framework and Realistic Benchmark for Robust Fake Audio Detection

HI-FAD and SpoofMix: A High-Frequency-Aware Framework and Realistic Benchmark for Robust Fake Audio Detection

ACL ARR 2025 May Submission7977 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, fake audio detection (FAD) has made great progress in response to sophisticated spoofing attacks. However, existing frameworks still overlook two critical needs: (1) frequency-aware analysis of artifacts and (2) benchmark that simulate real-world spoofing attacks based on speech mixtures. To deal with these gaps, we propose HI-FAD, a novel high-frequency–aware FAD framework, and SpoofMix, a challenging benchmark incorporating both real and spoofed speech within single audio samples. In particular, HI-FAD employs a discrete wavelet transform (DWT) to extract high-frequency subbands and fuses them with front-end model representations via cross-attention. % It can be seamlessly integrated into existing FAD models without any architectural modifications. Experimental results demonstrate that HI-FAD consistently outperforms conventional methods on the ASVspoof2019 Logical Access (LA) and ASVspoof2021 LA. Moreover, the proposed framework achieves state-of-the-art detection on SpoofMix, demonstrating its robustness under realistic mixed-speech conditions. The source code and SpoofMix benchmark are available here :https://github.com/blind-review-user123/HI-FAD.git

Paper Type: Short

Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Research Area Keywords: Fake Audio Detection

Languages Studied: English

Submission Number: 7977

Loading