Comparing GMM-based speech transformation systems

Published: 2007, Last Modified: 07 Oct 2025INTERSPEECH 2007EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This article deals with a study on GMM-based voice conversion systems. We compare the main linear conversion functions found in the literature on an identical speech corpus. We insist in particular on the risks of over-fitting and over-smoothing. We propose three alternatives for robust conversion functions in order to minimize these risks. We show, on two experimental speech databases, that the approach suggested by Kain remains the more precise but leads to an over-fitting ratio of 1.72%. The alternatives which we propose, present an average degradation of 2.8% for a 0.52% over-fitting ratio.
Loading