Low-Quality Deepfake Video Detection Model Targeting Compression-Degraded Spatiotemporal Inconsistencies

Published: 01 Jan 2024, Last Modified: 26 Jan 2025ICIC (9) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With Deepfakes gradually becoming an increasing threat to individuals and society, multimedia forensics, especially Deepfake video detection methods are in dire needs. However, most current attempts still rely too much on detail spatial features, which causes them to underperform when facing heavily-compressed Deepfake videos gone through multiple dissemination in reality. In this paper, we first present a theoretical analysis on Deepfake video manipulation paradigm to instantiate the systematic spatial and temporal inconsistencies caused by frame-wise manipulation process in Deepfake videos, along with analysis on how video recompression severely impacts spatial inconsistency detail trails while temporal inconsistencies remain most intact. Motivated by such analysis, Heavy-Compression Guidance Regulation (HCGR) module is proposed to better guide the feature extraction process towards more resilient and effective features. Along with Multi-layer Dynamic Feature Fusion (MDFF) module and Temporal Inconsistency Pattern Learning (TIPL) module, a Deepfake video detection model resilient against video heavy compression is proposed. With extensive experimental results, it is proven our proposed detection model exceeds state-of-the-art methods under more difficult conditions.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview