PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational Autoencoder

Published: 01 Jan 2023, Last Modified: 12 Mar 2025SSW 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper describes a novel approach to non-parallel many-to-many voice conversion (VC) that utilizes a variant of the conditional variational autoencoder (VAE) called a perturbation-resistant VAE (PRVAE). In VAE-based VC, it is commonly assumed that the encoder extracts content from the input speechwhile removing source speaker information. Following this extraction, the decoder generates output from the extracted content and target speaker information. However, in practice,the encoded features may still retain source speaker information, which can lead to a degradation of speech quality duringspeaker conversion tasks. To address this issue, we proposea perturbation-resistant encoder trained to match the encodedfeatures of the input speech with those of a pseudo-speech generated through a content-preserving transformation of the inputspeech’s fundamental frequency and spectral envelope using acombination of pure signal processing techniques. Our experimental results demonstrate that this straightforward constraintsignificantly enhances the performance in non-parallel many-to-many speaker conversion tasks. Audio samples can be accessedat http://www.kecl.ntt.co.jp/people/tanaka.ko/projects/prvaevc/.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview