Abstract: Current deepfake detection methods commonly use data augmentation and authenticity-content disentanglement to extract more generalized features for detection tasks. However, these methods rely exclusively on low-level spatial artifacts to distinguish real from fake images, which presents significant challenges in accurately capturing the rich forgery cues. Deepfakes create discrepancies between forged and original facial features within the face-recognition (FR) embedding space, which can serve as an additional cue for detection. To better exploit the artifacts in deepfake images, we propose a novel detection method that enhances the detector’s perception capability by incorporating not only the real and fake samples during training, but also the visual residual between real and fake images. Meanwhile, we integrate the discrepancy in facial embedding between the real and fake samples into the training procedure of artifact extraction, serving as a guidance signal with strong knowledge provided by the pretrained face recognition model. Specialized distillation loss along with additional cross-entropy losses are designed to enhance the detection capability. Experiments on multiple benchmarks demonstrate the superiority of the proposed approach in deepfake detection over literature methods.
External IDs:dblp:journals/tcsv/HanLWYZG25
Loading