ASR-Free Pronunciation Assessment

Published: 01 Jan 2020, Last Modified: 30 May 2024INTERSPEECH 2020EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Most of the pronunciation assessment methods are based on local features derived from automatic speech recognition (ASR), e.g., the Goodness of Pronunciation (GOP) score. In this paper, we investigate an ASR-free scoring approach that is derived from the marginal distribution of raw speech signals. The hypothesis is that even if we have no knowledge of the language (so cannot recognize the phones/words), we can still tell how good a pronunciation is, by comparatively listening to some speech data from the target language. Our analysis shows that this new scoring approach provides an interesting correction for the phone-competition problem of GOP. Experimental results on the ERJ dataset demonstrated that combining the ASR-free score and GOP can achieve better performance than the GOP baseline.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview