DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech

Kazuki Kawamura, Jun Rekimoto

Published: 01 Jan 2022, Last Modified: 07 Feb 2025ICMLA 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: When beginners learn to speak a non-native language, it is difficult for them to judge for themselves whether they are speaking well. Therefore, computer-assisted pronunciation training systems are used to detect learner mispronunciations. These systems typically compare the user’s speech with that of a native speaker in units of rhythm, phonemes, or words, and calculate the differences. However, they require extensive speech data with detailed annotations of non-native and native speakers, which are usually difficult to collect. To overcome this problem, we propose a new language learning support system that detects mispronunciations by beginners based on a small amount of unannotated native speaker speech data. The proposed system uses deep learning–based speech processing to display the pronunciation score of the learner’s speech and the difference/distance between the learner’s and the model’s pronunciation in an intuitively visual manner. Learners can gradually improve their pronunciation by eliminating differences and shortening the distance from the model until they become sufficiently proficient. We also built an application to help nonnative speakers learn English and confirmed that it can improve users’ speech intelligibility.