Diffusion Facial Forgery Detection

Harry Cheng; Yangyang Guo; Tianyi Wang; Liqiang Nie; Mohan Kankanhalli

Diffusion Facial Forgery Detection

Harry Cheng, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract:

Detecting diffusion-generated images has recently grown into an emerging research area. Existing diffusion-based datasets predominantly focus on general image generation. However, facial forgeries, which pose severe social risks, have remained less explored thus far. To address this gap, this paper introduces DiFF, a comprehensive dataset dedicated to face-focused diffusion-generated images. DiFF comprises over 500,000 images that are synthesized using thirteen distinct generation methods under four conditions. In particular, this dataset utilizes 30,000 carefully collected textual and visual prompts, ensuring the synthesis of images with both high fidelity and semantic consistency. We conduct extensive experiments on the DiFF dataset via human subject tests and several representative forgery detection methods. The results demonstrate that the binary detection accuracies of both human observers and automated detectors often fall below 30%, revealing insights on the challenges in detecting diffusion-generated facial forgeries. Moreover, our experiments demonstrate that DiFF, compared to previous facial forgery datasets, contains a more diverse and realistic range of forgeries, showcasing its potential to aid in the development of more generalized detectors. Finally, we propose an edge graph regularization approach to effectively enhance the generalization capability of existing detectors.

Primary Subject Area: [Experience] Multimedia Applications

Secondary Subject Area: [Generation] Generative Multimedia, [Generation] Social Aspects of Generative AI, [Content] Vision and Language

Relevance To Conference: This paper constructs a facial forgery dataset based on diffusion models to aid in forgery image detection research. Our dataset contains four conditional inputs: Text-to-Image, Image-to-Image, Face Swapping, and Face Editing, with over 500,000 images synthesized through multi-modal information such as text, images, and additional visual features (e.g., segmentation, landmarks). We conducted extensive experiments based on our dataset, thereby establishing an effective benchmark for diffusion facial forgery detection.

Supplementary Material: zip

Submission Number: 9

Loading