MAVOS-DD: Multilingual Audio-Video Open-Set Deepfake Detection Benchmark

ICLR 2026 Conference Submission16934 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: deepfake detection, audio-video benchmark, multilingual benchmark
TL;DR: We present the first large-scale open-set benchmark for multilingual audio-video deepfake detection.
Abstract: We present the first large-scale open-set benchmark for multilingual audio-video deepfake detection. Our dataset comprises over 300 hours of real and fake videos across eight languages, with 58% of data being generated. For each language, the fake videos are generated with several distinct audio and video deepfake generation models, selected based on the quality of the generated content. We organize the training, validation and test splits such that only a subset of the chosen generative models and languages are available during training, thus creating several challenging open-set evaluation setups. We perform experiments with various pre-trained and fine-tuned deepfake detectors proposed in recent literature. Our results show that state-of-the-art detectors are not currently able to maintain their performance levels when tested in our open-set scenarios. We publicly release our data and code at: https://anonymous.4open.science/r/MAVOS-DD.
Primary Area: datasets and benchmarks
Submission Number: 16934
Loading