FVBench:Benchmarking Deepfake Video Detection Capability of Large Multimodal Models

16 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deepfake Video Detection, Large Multimodal Models, Benchmark and Dataset
Abstract: As generative models rapidly evolve, the realism of AI-generated videos has reached new levels, posing significant challenges for detecting the authenticity of videos. Existing deepfake detection techniques generally rely on training datasets with limited generation methods and content diversity, which limits their generalization ability on more realistic content, particularly that produced by the latest generative models. Recently, large multimodal models (LMMs) have demonstrated remarkable zero-shot performance across a variety of vision tasks. Yet, their ability to discern deepfake videos remains largely untested. To this end, we propose **FVBench**, a comprehensive deep**f**ake **v**ideo **bench**mark designed to advance video deepfake detection. It includes: **(i)** extensive content diversity, with over 120K videos covering real, AI-edited, and fully AI-generated categories, **(ii)** comprehensive model coverage, with fake videos generated and edited by 42 of the state-of-the-art video synthesis and editing models, and **(iii)** deepfake video detection benchmark for LMMs, which is a comprehensive benchmark for exploring the deepfake video detection capabilities of LMMs. The FVBench dataset and evaluation code will be publicly available upon publication, offering a valuable resource for advancing deepfake detection.
Primary Area: datasets and benchmarks
Submission Number: 6819
Loading