Collecting, Curating, and Annotating Good Quality Speech deepfake dataset for Famous Figures: Process and Challenges

Hashim Ali; Surya Subramani; Raksha Varahamurthy; Nithin Sai Adupa; Lekha Bollinani; Hafiz Malik

Collecting, Curating, and Annotating Good Quality Speech deepfake dataset for Famous Figures: Process and Challenges

Hashim Ali, Surya Subramani, Raksha Varahamurthy, Nithin Sai Adupa, Lekha Bollinani, Hafiz Malik

Published: 01 Aug 2025, Last Modified: 26 Aug 2025SpeechAI TTIC 2025 OralorPosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Text-to-Speech, Database, political figures

TL;DR: This paper presents a high-quality speech deepfake dataset for political figures using automated collection and synthesis methods, achieving strong naturalness (NISQA-TTS 3.69) and a 61.9% human misclassification rate.

Presentation Preference: Open to it if recommended by organizers

Abstract: Recent advances in speech synthesis have introduced unprecedented challenges in maintaining voice authenticity, particularly concerning public figures who are frequent targets of impersonation attacks. This paper presents a comprehensive methodology for collecting, curating, and generating synthetic speech data for political figures, along with a detailed analysis of the challenges encountered. We introduce a systematic approach that incorporates an automated pipeline for collecting high-quality bona fide speech samples, featuring transcription-based segmentation that significantly improves the quality of synthetic speech. We experimented with various synthesis approaches, from single-speaker to zero-shot synthesis, and documented the evolution of our methodology. The resulting dataset comprises bonafide and synthetic speech samples from ten public figures, demonstrating superior quality with an NISQA-TTS naturalness score of 3.69 and the highest human misclassification rate of 61.9%.

Submission Number: 22

Loading