Abstract: The significance of face forgery detection has grown substantially due to the emergence of facial manipulation technologies. Recent methods have turned to face detection forgery in the spatial-frequency domain, resulting in improved overall performance. Nonetheless, these methods are still not guaranteed to cover various forgery technologies, and the networks trained on public datasets struggle to accurately quantify their uncertainty levels. In this work, we design a Dynamic Dual-spectrum Interaction Network that allows test-time training with uncertainty guidance and spatial-frequency prompt learning. RGB and frequency features are first interacted in multi-level by using a Frequency-guided Attention Module. Then these multi-modal features are merged with a Dynamic Fusion Module. As a bias in the fusion weight of uncertain data during dynamic fusion, we further exploit uncertain perturbation as guidance during the test-time training phase. Furthermore, we propose a spatial-frequency prompt learning method to effectively enhance the generalization of the forgery detection model. Finally, we curate a novel, extensive dataset containing images synthesized by various diffusion and non-diffusion methods. Comprehensive evaluations of experiments show that our method achieves more appealing results for face forgery detection than recent state-of-the-art methods.
Loading