Hybrid Spatio-Temporal Network for Face Forgery Detection

Xuhui Liu, Sicheng Gao, Peizhu Zhou, Jianzhuang Liu, Xiaoyan Luo, Luping Zhang, Baochang Zhang

Published: 01 Jan 2023, Last Modified: 09 May 2025ACPR (3) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Facial manipulation techniques have aroused increasing security concerns, leading to various methods to detect forgery videos. However, existing methods suffer from a significant performance gap compared to image manipulation methods, partially because the spatio-temporal information is not well explored. To address the issue, we introduce a Hybrid Spatio-Temporal Network (HSTNet) to integrate spatial and temporal information in the same framework. Specifically, our HSTNet utilizes a hybrid architecture, which consists of a 3D CNN branch and a transformer branch, to jointly learn short- and long-range relations in the spatio-temporal dimension. Due to the feature misalignment between the two branches, we design a Feature Alignment Block (FAB) to recalibrate and efficiently fuse heterogeneous features. Moreover, HSTNet introduces a Vector Selection Block (VSB) to combine the outputs of the two branches and fire important features for classification. Extensive experiments show that HSTNet obtains the best overall performance over state-of-the-art methods.