Leveraging Effective Language and Speaker Conditioning In Indic TTS for Limmits 2024 Challenge

Published: 01 Jan 2024, Last Modified: 06 Oct 2024ICASSP Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, we explain the model that was developed by the NLP_POSTECH team for the LIMMITS 2024 Grand Challenge. Among the three tracks, we focus on Track 1, which necessitates the creation of a few-shot text-to-speech (TTS) system that generates natural speech across diverse languages. Towards this end, to realize multi-lingual capability, we incorporate a learnable language embedding. In addition, for precise imitation of target speaker voices, we leverage an inductive speaker bias conditioning methodology. Despite the simplicity of our strategy, our model is able to demonstrate remarkable efficacy in the generation of natural speech and preservation of high speaker fidelity for both mono and cross-lingual settings.
Loading