Data augmentation with attention framework for robust deepfake detection

Sardor Mamarasulov, Lianggangxu Chen, Changgu Chen, Yang Li, Changbo Wang

Published: 2025, Last Modified: 05 Nov 2025Vis. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deepfake detection has become an essential task in combating the proliferation of manipulated media. Current methods of deepfake detection typically use a Sequential DeepFake Manipulation Architecture to model the face features, which often suffer from overfitting. The main reason lies in the model becoming too specialized in recognizing a limited set of manipulated faces used in the training data. In this paper, a Data Augmentation with Attention Framework is proposed to address overfitting for robust deepfake detaction. Firstly, we advance the existing Sequential DeepFake Manipulation Architecture by integrating Grad-CAM to focus on critical facial regions, thereby enhancing the interpretive and optimization capabilities of the model. This sequence not only preserves the integrity of important facial features but also promotes robust feature learning and model generalization. Then, we incorporate CutMix augmentation, strategically applying it to the key areas identified by Grad-CAM. Finally, We hypothesize that incorporating these augmentations into the existing method can further enhance its ability to detect deepfake manipulations. By utilizing CutMix to blend image patches, we introduce additional perturbations that encourage the model to learn more discriminative features and generalize better to unseen data. Extensive evaluations are conducted to show that our approach significantly outperforms state-of-the-art methods by a large margin. The semantic meaning of our method is also verified by the visualization results. Significantly, our experiments conducted on the Seq-DeepFake dataset demonstrate the effectiveness of this approach.

External IDs:dblp:journals/vc/MamarasulovCCLW25