Abstract: With advancements of deep learning techniques, it is now possible
to generate super-realistic images and videos, i.e., deepfakes. These
deepfakes could reach mass audience and result in adverse impacts
on our society. Although lots of efforts have been devoted to detect deepfakes, their performance drops significantly on previously
unseen but related manipulations and the detection generalization
capability remains a problem. Motivated by the fine-grained nature and spatial locality characteristics of deepfakes, we propose
Locality-Aware AutoEncoder (LAE) to bridge the generalization gap.
In the training process, we use a pixel-wise mask to regularize local
interpretation of LAE to enforce the model to learn intrinsic representation from the forgery region, instead of capturing artifacts
in the training set and learning superficial correlations to perform
detection. We further propose an active learning framework to select the challenging candidates for labeling, which requires human
masks for less than 3% of the training data, dramatically reducing
the annotation efforts to regularize interpretations. Experimental
results on three deepfake detection tasks indicate that LAE could
focus on the forgery regions to make decisions. The analysis further
shows that LAE outperforms the state-of-the-arts by 6.52%, 12.03%,
and 3.08% respectively on three deepfake detection tasks in terms
of generalization accuracy on previously unseen manipulations.
0 Replies
Loading