Identity-Driven Multimedia Forgery Detection via Reference Assistance

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent technological advancements, such as the "deepfake" techniques, have paved the way for generating various media forgeries. In response to the potential hazards of these media forgeries, many researchers engage in exploring detection methods, increasing the demand for high-quality media forgery datasets. Despite this, existing datasets have certain limitations. Firstly, most datasets focus on manipulating visual modality and usually lack diversity, as only a few forgery approaches are considered. Secondly, the quality of media is often inadequate in clarity and naturalness. Meanwhile, the size of the dataset is also limited. Thirdly, it is commonly observed that real-world forgeries are motivated by identity, yet the identity information of the individuals portrayed in these forgeries within existing datasets remains under-explored. For detection, identity information could be an essential clue to boost performance. Moreover, official media concerning relevant identities on the Internet can serve as prior knowledge, aiding both the audience and forgery detectors in determining the true identity. Therefore, we propose an identity-driven multimedia forgery dataset, IDForge, which contains 249,138 video shots. All video shots are sourced from 324 wild videos of 54 celebrities collected from the Internet. The fake video shots involve 9 types of manipulation across visual, audio, and textual modalities. Additionally, IDForge provides extra 214,438 real video shots as a reference set for the 54 celebrities. Correspondingly, we design an effective multimedia detection network termed the Reference-assisted Multimodal Forgery Detection Network (R-MFDN). Through extensive experiments on the proposed dataset, we demonstrate the effectiveness of R-MFDN on the multimedia detection task.
Primary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: This work contributes to multimedia processing by proposing an identity-driven multimedia forgery dataset, IDForge. While many existing forgery datasets focus primarily on single-modal data and under-explore identity information of speakers in forged media, IDForge provides a rich identity-relevant resource by focusing on high-quality video shots of celebrities across visual, audio, and textual modalities. Furthermore, this paper proposes a novel method, the Reference-assisted Multimodal Forgery Detection Network (R-MFDN). This method leverages identity-aware and cross-modal contrastive learning to exploit identity inconsistencies across different modalities. Through extensive experiments on the proposed dataset, we demonstrate that the IDForge dataset is challenging for existing forgery detection methods and that R-MFDN is effective in the task.
Supplementary Material: zip
Submission Number: 1270
Loading