Keywords: Deepfake Detection, weakly-supervised deepfake localization, Diffusion-generated image detection and localization
Abstract: The striking proficiency of generative models in producing and manipulating images with an unprecedented level of realism has elicited concerns regarding to malicious applications like face manipulation techniques. However, the majority of existing face forgery detection models are developed to provide only the real or fake binary label for a given image, which is not sufficient to identify the location of the manipulated area. In this paper, we propose a weakly supervised method for face manipulation localization and detection based on the Vision Transformer architecture, which only makes use of image-level labels and can realize forgery localization without extra pixel-level annotations. Unlike other weakly-supervised localization methods which conduct prediction directly depending on the feature of the single image, we design a novel weakly-supervised localization method (MVG-FL) that leverages statistical distribution characteristics of the entire dataset. MVG-FL estimates multivariate Gaussian (MVG) distributions for real and fake samples, and further uses the learned distributions to predict the location of the manipulated area. Additionally, based on the predicted mask, we propose a Distribution Centrality Learning to improve the compactness of patch embeddings around the distribution centers to further promote forgery localization. Additionally, we develop a new large-scale face manipulated image dataset, named DiffFMD, which is composed of various state-of-the-art diffusion-based generators and multiple sizes of facial manipulation regions.The experimental results demonstrate that the proposed method can achieve high detection and localization performance for face manipulation images.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 24285
Loading