Faking Fluent: Unveiling the Achilles' Heel of Multilingual Deepfake Detection

Rishabh Ranjan, Bikash Dutta, Mayank Vatsa, Richa Singh

Published: 01 Jan 2024, Last Modified: 19 Feb 2025IJCB 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the rapid advancement of deep learning techniques, the generation of audio deepfakes has achieved remarkable realism across various languages and accents. However, the effectiveness of audio deepfake detection models in diverse linguistic environments remains a crucial area of investigation. This paper presents the first empirical study on the robustness of current audio deepfake detection algorithms across different languages and accents. We evaluate whether these models maintain their effectiveness across varied linguistic domains or perform better in specific language contexts. Our comprehensive analysis examines state-of-the-art audio deepfake detection models trained on the ASVspoof 2019 and BhashaBluff datasets, assessing their performance across four diverse datasets: three representing similar-language variations (Speech Accent Archive, Svarah, and the UK English Accent Dataset) and one representing a different language (Vaani). Our results and supporting analysis indicate that while current models perform well on benchmark datasets, their ability to generalize across diverse linguistic conditions is limited. We identify potential vulnerabilities in existing models when faced with unfamiliar languages or accents, highlighting the need for more inclusive and adaptable detection systems. Our results highlight the need to enhance the robustness of audio deepfake detection across the global linguistic spectrum and emphasize the importance of developing models capable of effectively identifying synthetic speech, regardless of language or accent.