MeST-Former: Motion-enhanced Spatiotemporal Transformer for generalizable Deepfake detection

Baoping Liu, Bo Liu, Ming Ding, Tianqing Zhu

Published: 2024, Last Modified: 23 Oct 2025Neurocomputing 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•RGB and motion inputs provide distinctive spatial and temporal information.•Detaching ID-related components from original embeddings can improve the generalization capabilities of the Deepfake detector.•The Swin Transformer are powerful in modeling spatiotemporal embeddings for classification.•An effectively implemented face-cropping strategy minimizes the influence of background elements.