Abstract: We propose a unified approach to simultaneously addressing the conventional setting of binary deepfake classification
and a more challenging scenario of uncovering what facial
components have been forged as well as the exact order of
the manipulations. To solve the former task, we consider multiple instance learning (MIL) that takes each image as a bag
and its patches as instances. A positive bag corresponds to
a forged image that includes at least one manipulated patch
(i.e., a pixel in the feature map). The formulation allows us
to estimate the probability of an input image being a fake
one and establish the corresponding contrastive MIL loss.
On the other hand, tackling the component-wise deepfake
problem can be reduced to solving multi-label prediction, but
the requirement to recover the manipulation order further
complicates the learning task into a multi-label ranking problem. We resolve this difficulty by designing a tailor-made
loss term to enforce that the rank order of the predicted
multi-label probabilities respects the ground-truth order of
the sequential modifications of a deepfake image. Through
extensive experiments and comparisons with other relevant
techniques, we provide extensive results and ablation studies
to demonstrate that the proposed method is an overall more
comprehensive solution to deepfake detection.
Loading