Abstract: —Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited
generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization
capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation
of detection accuracy when models are directly trained on combined datasets due to the discrepancy across collection scenarios and
generation methods. To address the above issue, a Generalized Multi-Scenario Deepfake Detection framework (GM-DF) is proposed to
serve multiple real-world scenarios by a unified model. First, we propose a hybrid expert modeling approach for domain-specific
real/forgery feature extraction. Besides, as for the commonality representation, we use CLIP to extract the common features for better
aligning visual and textual features across domains. Meanwhile, we introduce a masked image reconstruction mechanism to force
models to capture rich forged details. Finally, we supervise the models via a domain-aware meta-learning strategy to further enhance
their generalization capacities. Specifically, we design a novel domain alignment loss to strongly align the distributions of the meta-test
domains and meta-train domains. Thus, the updated models are able to represent both specific and common real/forgery features
across multiple datasets. In consideration of the lack of study of multi-dataset training, we establish a new benchmark leveraging
multi-source data to fairly evaluate the models’ generalization capacity on unseen scenarios. Both qualitative and quantitative
experiments on five datasets conducted on traditional protocols as well as the proposed benchmark demonstrate the effectiveness of
our approach.
Loading