IndicFake Meets SAFARI-LLM: Unifying Semantic and Acoustic Intelligence for Multilingual Deepfake Detection

TMLR Paper5058 Authors

09 Jun 2025 (modified: 17 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Audio deepfakes pose a growing threat, particularly in linguistically diverse and low-resource settings where existing detection methods often struggle. This work introduces two transformative contributions to address these challenges. First, we present \textbf{IndicFake}, a pioneering audio deepfake dataset with over 4.2 million samples (7,350 hours) spanning English and 17 Indian languages across Indo-European, Dravidian, and Sino-Tibetan families. With minimal overlap (Jaccard similarity: 0.00--0.06) with existing datasets, IndicFake offers an unparalleled benchmark for multilingual deepfake detection. Second, we propose \textbf{SAFARI-LLM} (Semantic Acoustic Feature Adaptive Router with Integrated LLM), a novel framework that integrates Whisper’s semantic embeddings and m-HuBERT’s acoustic features through an adaptive Audio Feature Unification Module (AFUM). Enhanced by LoRA-fine-tuned LLaMA-7B, SAFARI-LLM achieves unmatched cross-lingual and cross-family generalization. Evaluations across IndicFake, DECRO, and WaveFake datasets demonstrate its superiority, outperforming 14 state-of-the-art models with standout accuracies of 94.21\% (English-to-Japanese transfer on WaveFake) and 84.48\% (English-to-Chinese transfer on DECRO), alongside robust performance across diverse linguistic contexts. These advancements establish a new standard for reliable, scalable audio deepfake detection. Code and resources are publicly available at: \href{https://anonymousillusion.github.io/indicfake/}{\textcolor{blue}{URL}}.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Brian_Kingsbury1
Submission Number: 5058
Loading