Expressive Speech Synthesis with Theme-Oriented Few-Shot Learning in ICAGC 2024

Published: 01 Jan 2024, Last Modified: 14 Apr 2025ISCSLP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper introduces an expressive speech synthesis system submitted to Track 1 of ICAGC 2024. The objective of this track is to clone the voices of the target speakers using provided speech data and modulate them to convey appropriate emotions for various themes, such as novel chapters and ancient Chinese poems. Our system primarily employs a pretrained GPT-SoVITS, a two-stage large-scale speech synthesis system. In addition, we have developed a theme-oriented few-shot learning strategy tailored to specific themes. This strategy involves fine-tuning the pre-trained models with sentences spoken by different speakers but on the same theme. This approach aims to refine the models to focus on both the specific themes and individual speaker characteristics. The competition results underscore the efficacy of our approach, culminating in a fourth-place finish among all participating teams.
Loading