[Proposal-ML] Mimicking Humanity: A synthetic data-based approach to voice cloning in Text to Speech Systems

28 Oct 2024 (modified: 05 Nov 2024)THU 2024 Fall AML SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Voice Cloning, Speaker Generalization, Text-To-Speech, Synthetic Data, Privacy
Abstract: Acquisition of training data poses significant challenges in many areas of machine learning. Data paucity has a negative impact on model accuracy and bias. One potential way to obtain more training data is by synthesizing it using some external tools. This project presents a voice cloning technique using synthetic data for text-to-speech systems, to address issues with data accessibility and privacy in conventional text-to-speech models. By utilizing artificial synthetic voice data, the model is created to mimic voice features such as pitch, tone, and timbre, allowing for the creation of speech that mimics a specific speaker while also keeping the original linguistic content intact. This method aims to improve model generalization across different speaker profiles and languages by not requiring large human speech datasets, allowing for local inference without needing proprietary data. If successful, this will demonstrate that synthetic data can be used to train AI systems if the task it is applied to is chosen correctly.
Submission Number: 30
Loading