Keywords: face forgery detection, multitask learning, joint embedding, vision-langauge correspondence
Abstract: Multitask learning for face forgery detection has experienced impressive successes in recent years. Nevertheless, the semantic relationships among different forgery detection tasks are generally overlooked in previous methods, which weakens knowledge transfer across tasks. Moreover, previously adopted multitask learning schemes require human intervention on allocating model capacity to each task and computing the loss weighting, which is bound to be suboptimal. In this paper, we aim at automated multitask learning for face forgery detection from a joint embedding perspective. We first define a set of coarse-to-fine face forgery detection tasks based on face attributes at different semantic levels. We describe the ground-truth for each task via a textural template, and train two encoders to jointly embed visual face images and textual descriptions in the shared feature space. In such a manner, the semantic closeness between two tasks is manifested as the distance in the learned feature space. Moreover, the capacity of the image encoder can be automatically allocated to each task through end-to-end optimization. Through joint embedding, face forgery detection can be performed by maximizing the feature similarity between the test face image and candidate textual descriptions. Extensive experiments show that the proposed method improves face forgery detection in terms of generalization to novel face manipulations. In addition, our multitask learning method renders some degree of model interpretation by providing human-understandable explanations.
Supplementary Material: pdf
Submission Number: 10131
Loading