Quality assessment of voice converted speech using articulatory featuresDownload PDFOpen Website

2017 (modified: 16 Sept 2021)ICASSP 2017Readers: Everyone
Abstract: We propose a novel application of the acoustic-to-articulatory inversion (AAI) towards a quality assessment of the voice converted speech. The ability of humans to speak effortlessly requires the coordinated movements of various articulators, muscles, etc. This effortless movement contributes towards a naturalness, intelligibility and speaker's identity (which is partially present in voice converted speech). Hence, during voice conversion (VC), the information related to the speech production is lost. In this paper, this loss is quantified for a male voice, by showing an increase in RMSE error (up to 12.7 % in tongue tip) for voice converted speech followed by showing a decrease in mutual information (I) (by 8.7 %). Similar results are obtained in the case of a female voice. This observation is extended by showing that the articulatory features can be used as an objective measure. The effectiveness of the proposed measure over MCD is illustrated by comparing their correlation with a Mean Opinion Score (MOS). Moreover, the preference score of MCD contradicted ABX test by 100 %, whereas the proposed measure supported ABX test by 45.8 % and 16.7 % in the case of female-to-male and male-to-female VC, respectively.
0 Replies

Loading