Robust Training for Deepfake Detection Models Against Disruption-Induced Data Poisoning

Published: 2023, Last Modified: 08 Jan 2026WISA 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As Generative Adversarial Networks continue to evolve, deepfake images have become notably more realistic, escalating societal, economic, and political threats. Consequently, deepfake detection has emerged as a crucial research area to deal with these rising threats. Additionally, deepfake disruption, a method that introduces proactive perturbations to genuine images to thwart deepfake generation, has arisen as a prospective defense mechanism. While adopting these two strategies simultaneously seems beneficial in countering deepfakes, this paper first highlights a concern related to their co-existence: genuine images gathered from the Internet, already imbued with disrupting perturbations, can lead to data poisoning in the training datasets of deepfake detection models, thereby severely affecting detection accuracy. This problem, despite its practical implications, has not been adequately addressed in previous deepfake detection studies. This paper proposes a novel training framework to address this problem. Our approach purifies disruptive perturbations during model training using a reverse process of the denoising diffusion probabilistic model. This purification process, faster than the leading method called DiffPure, enables successful deepfake image generation for training and significantly curtails accuracy loss in poisoned datasets. Demonstrating superior performance across detection models, our framework anticipates broad applicability. Our implementation is available at https://github.com/seclab-yonsei/Anti-disrupt.
Loading