Abstract: In the era of digital media, the proliferation of forged images and videos poses a significant threat to societal stability. With the rapid advancement of deep learning, the generation of realistic fake images has become increasingly simple, presenting unprecedented challenges in discerning the authenticity of images. While some existing methods have shown promising results in forgery detection, they often underutilize facial semantic information. To address this issue, this paper introduces the Semantic Token Transformer for Face Forgery Detection. By incorporating facial semantic information with a transformer network, the input tokens of the transformer are transformed into tokens of varying shapes and sizes based on their importance, thereby enhancing the accuracy of the detector. To achieve this objective, we first employ an image processing stage to manipulate the image based on facial semantic information. Subsequently, we introduce a scoring network, guided by prior knowledge, which adaptively categorizes tokens into different clusters based on their importance and relevance to the results of the preprocessing stage. Finally, we merge the tokens within the clusters using an attention mechanism and input them into the detector for forgery detection. Through experiments conducted on multiple datasets and cross-dataset evaluations, we demonstrate that our approach outperforms state-of-the-art detection methods.
External IDs:dblp:journals/tifs/PengLLWHG25
Loading