Abstract: Face forgery detection is crucial in preserving the security and integrity of facial data amidst the rapid developments in face manipulation techniques and deep generative models. Existing methods for video face forgery detection typically assume that all frames in a forged video are manipulated, while identifying partially forged videos with only a subset of altered frames is still a challenge to be solved. To address this issue, we propose a novel framework, i.e., the UVIF, that utilizes additional annotated images to provide fine-grained supervision for detecting partial forgeries in videos. The UVIF integrates a unified encoder and a multi-task learning paradigm to model both facial videos and images for boosted video face forgery detection. A 2D backbone with temporal fusion modules is employed for the unified encoder. A pseudo labeling process is also designed for facial video frames to bridge the representation of individual video frames and static images. Extensive experiments on benchmark datasets demonstrate the effectiveness of our framework, outperforming state-of-the-art methods in detecting partially forged videos while introducing no additional computational overhead. Our code is available at unmapped: uri https://github.com/haotianll/UVIF.
Loading