Abstract: Highlights•Introduces a new task of emotion selectable text-based speech editing with Emo-CampNet model, employing decoupling and reconstruction techniques for precise emotional control during text-based speech editing.•Introduces a neutral content generator optimized with a generative adversarial network to ensure emotion in generated speech is solely determined by input emotion attributes, removing emotional elements from original speech effectively.•Introduces two data augmentation techniques to enhance emotional and pronunciation information of training data, effectively enhancing model performance and enabling editing of speech from unseen speakers.
Loading