FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset

Hasam Khalid; Shahroz Tariq; Minha Kim; Simon S. Woo

FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset

Hasam Khalid, Shahroz Tariq, Minha Kim, Simon S. Woo

Published: 11 Oct 2021, Last Modified: 04 May 2025NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone

Keywords: Audio-Video Deepfakes, Media Forensics, Multimodal dataset

TL;DR: A novel multimodal racial unbias deepfake dataset containing three types of deepfakes Type1 (Fake audio) Type2 (Fake video) Type 3 (Fake Audio & Video).

Abstract: While the significant advancements have made in the generation of deepfakes using deep learning technologies, its misuse is a well-known issue now. Deepfakes can cause severe security and privacy issues as they can be used to impersonate a person's identity in a video by replacing his/her face with another person's face. Recently, a new problem of generating synthesized human voice of a person is emerging, where AI-based deep learning models can synthesize any person's voice requiring just a few seconds of audio. With the emerging threat of impersonation attacks using deepfake audios and videos, a new generation of deepfake detectors is needed to focus on both video and audio collectively. To develop a competent deepfake detector, a large amount of high-quality data is typically required to capture real-world (or practical) scenarios. Existing deepfake datasets either contain deepfake videos or audios, which are racially biased as well. As a result, it is critical to develop a high-quality video and audio deepfake dataset that can be used to detect both audio and video deepfakes simultaneously. To fill this gap, we propose a novel Audio-Video Deepfake dataset, FakeAVCeleb, which contains not only deepfake videos but also respective synthesized lip-synced fake audios. We generate this dataset using the current most popular deepfake generation methods. We selected real YouTube videos of celebrities with four ethnic backgrounds to develop a more realistic multimodal dataset that addresses racial bias, and further help develop multimodal deepfake detectors. We performed several experiments using state-of-the-art detection methods to evaluate our deepfake dataset and demonstrate the challenges and usefulness of our multimodal Audio-Video deepfake dataset.

Supplementary Material: zip

URL: https://sites.google.com/view/fakeavcelebdash-lab/

Contribution Process Agreement: Yes

Dataset Url: https://sites.google.com/view/fakeavcelebdash-lab/home

Author Statement: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 5 code implementations](https://www.catalyzex.com/paper/fakeavceleb-a-novel-audio-video-multimodal/code)

18 Replies

Loading