MeST-Former: Motion-enhanced Spatiotemporal Transformer for generalizable Deepfake detection

Published: 01 Jan 2024, Last Modified: 23 Oct 2025Neurocomputing 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•RGB and motion inputs provide distinctive spatial and temporal information.•Detaching ID-related components from original embeddings can improve the generalization capabilities of the Deepfake detector.•The Swin Transformer are powerful in modeling spatiotemporal embeddings for classification.•An effectively implemented face-cropping strategy minimizes the influence of background elements.
Loading