CVC: Contrastive Learning for Non-Parallel Voice ConversionDownload PDFOpen Website

Published: 01 Jan 2021, Last Modified: 17 May 2023Interspeech 2021Readers: Everyone
Abstract: Cycle consistent generative adversarial network (CycleGAN) and variational autoencoder (VAE) based models have gained popularity in non-parallel voice conversion recently. However, they often suffer from difficult training process and unsatisfactory results. In this paper, we propose a contrastive learning-based adversarial approach for voice conversion, namely contrastive voice conversion (CVC). Compared to previous CycleGAN-based methods, CVC only requires an efficient one-way GAN training by taking the advantage of contrastive learning. When it comes to non-parallel one-to-one voice conversion, CVC is on par or better than CycleGAN and VAE while effectively reducing training time. CVC further demonstrates superior performance in many-to-one voice conversion, enabling the conversion from unseen speakers.
0 Replies

Loading