SVDF-20: A LARGE-SCALE MULTILINGUAL BENCHMARK FOR AI-GENERATED SINGING DETECTION

Jyotishman Das; Mayank Vatsa; Richa Singh

SVDF-20: A LARGE-SCALE MULTILINGUAL BENCHMARK FOR AI-GENERATED SINGING DETECTION

Jyotishman Das, Mayank Vatsa, Richa Singh

20 Sept 2025 (modified: 14 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: singing voice deepfake detection, multilingual benchmark, cross-lingual generalization, audio forensics, deepfake detection, SVDD, multilingual audio, voice synthesis detection, audio authenticity, singing voice synthesis, cross-domain evaluation, audio deepfake benchmark, multilingual audio forensics, singing voice authentication, audio spoofing detection, machine learning, deep learning

TL;DR: SVDF-20: First large-scale multilingual singing voice deepfake detection benchmark (20 languages, 772K+ clips). Training on SVDF-20 yields 30-40% better performance on unseen languages vs SingFake baselines.

Abstract: As generative models replicate human singing with uncanny precision, detection systems must operate reliably across all languages, not just English or Mandarin. Current detectors fail catastrophically on unfamiliar languages, a critical gap we address with SVDF-20, the first comprehensive multilingual singing voice deepfake detection benchmark. Our contributions are threefold: (1) We provide a quality-controlled dataset of 24,421 songs ($1,475.6$ hours) across 20 languages, introducing $87%$ novel linguistic content compared to existing resources—including all 10 major Indic languages previously absent from singing voice deepfake detection research. (2) We demonstrate through experiments on eight architectures that multilingual training is essential: models trained on limited languages degrade to $45%$ Equal Error Rate (EER) on diverse languages, while SVDF-20-trained models achieve a $31%$ relative improvement, maintaining robust detection across all linguistic contexts. (3) We establish evaluation protocols with singer-disjoint splits and codec robustness tests that reveal how linguistic diversity fundamentally changes what models learn, shifting from language-specific patterns to universal synthesis artifacts. These findings establish that SVDF-20 enables the development of deepfake detectors capable of safeguarding musical authenticity globally, not just in data-rich languages. Data and Code: \href{https://anonymous.4open.science/r/SVDF20-D328/}{https://anonymous.4open.science/r/SVDF20-D328/}

Primary Area: datasets and benchmarks

Submission Number: 24143

Loading