The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge

Published: 01 Jan 2024, Last Modified: 18 May 2025ISCSLP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice syn-thesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. Notably, we achieved 1st rank on the leaderboard11https://huggingface.co/spaces/discrete-speech/interspeech2024_discrete_speech_tts_full in the TTS track both with the whole training set and only 1h training data, with the highest UTMOS score and lowest bitrate among all submissions.
Loading