Emotion selectable end-to-end text-based speech editing

Tao Wang, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen, Chu Yuan Zhang

Published: 2024, Last Modified: 14 Apr 2025Artif. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Introduces a new task of emotion selectable text-based speech editing with Emo-CampNet model, employing decoupling and reconstruction techniques for precise emotional control during text-based speech editing.•Introduces a neutral content generator optimized with a generative adversarial network to ensure emotion in generated speech is solely determined by input emotion attributes, removing emotional elements from original speech effectively.•Introduces two data augmentation techniques to enhance emotional and pronunciation information of training data, effectively enhancing model performance and enabling editing of speech from unseen speakers.