A Study on Combining Non-Parallel and Parallel Methodologies for Mandarin-English Cross-Lingual Voice Conversion

Chang Huai You, Minghui Dong

Published: 2024, Last Modified: 09 Mar 2026ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, we propose a cross-lingual voice conversion (VC) scheme leveraging non-parallel and parallel methodologies. The goal of cross-lingual VC is to transform the voice of one speaker from a language dataset into the voice of another speaker from a different language dataset. First, two non-parallel methods are separately investigated, they are CycleGAN-VC2 and phonetic posteriorGrams (PPG) VC. Second, two different parallel VC systems are developed to enhance the quality of the converted speech spectrogram, where the output speech from the non-parallel VC is used to form the parallel pair with the corresponding original speech. Focusing on Mandarin-English bilingual databases, the proposed VC scheme improves speech naturalness and speaker similarity as compared to the baseline non-parallel methods.
Loading