SAWT: Adaptive Wavelet-Based Transformer with Self-Paced Auto Augmentation for Face Forgery Detection

Yihui Li, Yifan Zhang, Hongyu Yang, Binghui Chen, Di Huang

Published: 01 Jan 2024, Last Modified: 05 Mar 2025Int. J. Comput. Vis. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Face forgery detection (FFD) on digital images has become increasingly challenging with the proliferation of sophisticated manipulation techniques. In this study, we propose a novel approach, named Adaptive Wavelet-based Transformer with Self-paced Auto Augmentation (SA\(^3\)WT), which naturally combines the global representation capabilities of visual transformers with adaptive enhancement of fine-grained artifacts in the frequency domain to effectively capture forgery patterns. In particular, to adequately handle various clues, the network incorporates Wavelet-based Mixed Attention (WMA) Transformer block to better leverage the information residing in all frequency sub-bands and a Residual Reserve Fine-grained Sampler (RRFS) to enhance detailed forgery artifacts while learning hierarchical global representations. By deeply mixing the modeling processes of global representations and fine-grained features throughout the network, the model captures rich forgery clues while simultaneously bypassing the fusion issue arising from their separate extraction. Furthermore, Self-paced Auto Augmentation Strategy (SAAS) facilitates model learning by unifying data augmentation and active learning in a coupled manner. Extensive experiments conducted on several benchmarks demonstrate the superiority of SA\(^3\)WT compared to state-of-the-art methods. The ablation studies and cross-dataset evaluations confirm the significance of the specifically designed modules, in terms of both effectiveness and generalization. Our findings suggest that the pure visual transformers also provide a promising direction for advanced forgery detection in real-world scenarios.