Towards Benchmarking and Evaluating Deepfake Detection

Jingyi Deng, Chenhao Lin, Pengbin Hu, Chao Shen, Qian Wang, Qi Li, Qiming Li

Published: 01 Jan 2024, Last Modified: 26 Jul 2025IEEE Trans. Dependable Secur. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deepfake detection automatically recognizes the manipulated media by analyzing whether it contains forgeries generated through deep learning. It is natural to ask which among the existing deepfake detection approaches stand out as top performers. This question is pivotal for identifying promising research directions and offering practical guidance. Unfortunately, conducting a sound benchmark comparison of popular detection approaches based on literature results is challenging due to inconsistent evaluation conditions across studies. In this paper, our objective is to achieve a sound comparison between detection approaches by establishing a comprehensive and consistent benchmark, developing a repeatable evaluation procedure, and performing extensive performance evaluation. Accordingly, a challenging dataset consisting of the manipulated samples generated by more than 12 different methods is collected. Subsequently, we implement and evaluate 13 prominent detection approaches (comprising 11 algorithms) from existing literature, utilizing five fair-minded and practical evaluation metrics. Finally, we provide up to 882 comprehensive evaluations by training 117 detection models. The results, along with the shared data and evaluation methodology, constitute a benchmark for comparing deepfake detection approaches and measuring progress.