Reputation Defender: Local Black-Box Adversarial Attack against Image-Translation-Based DeepFake

Published: 01 Jan 2024, Last Modified: 19 Feb 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: DeepFakes technologies possess powerful capabilities to convincingly modify the expressions, appearances, and identities of targets in photos and videos. This capability has enabled various forms of misuse, e.g., blackmail, nonconsensual pornography, and political disinformation, that severely harm the reputation of people. To mitigate this issue, a leading defensive approach is to add adversarial perturbations to the original images or videos, causing the core components of image-translation-based DeepFake systems to fail. However, we found that existing perturbation techniques for image-translation-based DeepFake systems are mostly implemented in white-box settings, making them hard to apply in realistic scenarios. Moreover, these techniques indiscriminately alter the entire image, often failing to protect the most critical facial regions. In this paper, we propose a novel adversarial perturbation generation framework called ReDef in the black-box setting, which narrowly focuses on perturbing facial regions to fool image-translation-based DeepFake systems. By diversifying the output and using the prior knowledge to guide the direction of optimizing the adversarial perturbations, ReDef exhibits better query efficiency and attack success rates. Compared to the state-of-the-art works, ReDef can improve the ASR by 37.8%, and reduce the query count by 41.3%.
Loading