Keywords: deepfake detection; Color spatial temporal feature map
Abstract: The detection of deepfakes continues to grapple with challenges arising from the rapid evolution of generative models and the intricate characteristics of real-world data. Current detection frameworks frequently exhibit overfitting to particular artifacts, which constrains their effectiveness against novel manipulation techniques. While many models demonstrate high accuracy on standardized benchmark datasets, their performance often deteriorates when confronted with authentic deepfake instances. This study investigated the integration of biometric data, explicitly addressing the limitations of deepfake generation in mirroring the subtle biometric variations present in human faces. By segmenting facial regions into mesh representations, we analyzed the correlation between RGB features and biometric signals, particularly focusing on heart rate data. This approach enabled the development of Color-Based Spatial-Temporal (CST) feature maps, which provide a more nuanced depiction of the interactions between visual attributes and biometric inputs. The goal of this study was to propose a novel feature map and evaluate its performance. We assessed the effectiveness of these biosignal feature maps in conjunction with established detection models on the FaceForensics++ (c23 and c40 compression levels) and Celeb-DF datasets. The incorporation of these feature maps resulted in remarkable outcomes, achieving nearly 99% accuracy (ACC) and an area under the curve (AUC) nearing 1. Importantly, our method demonstrates strong effectiveness in detecting low-quality deepfakes images with high compression level. Transitioning to a transfer learning framework, while retaining the biosignal feature maps, yielded further enhancements in performance metrics. These findings underscore the considerable value of integrating biometric information to bolster deepfake detection capabilities, often surpassing the results of prior research while remaining anchored in fundamental learning principles. The model exhibited consistent performance across diverse cross-testing scenarios, highlighting its robustness and adaptability.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8450
Loading