Abstract: Second language learners’ correct and exact pronunciation is one of the important factors that help improve their own communication skills. Therefore, a system for predicting mispronunciation or assessing pronunciation accuracy for second language learners has been proposed and studied for decades. However, the results obtained are still very limited. In this paper, we present two popular deep learning models including Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM) to solve the problem of predicting incorrect pronunciation for Vietnamese learners of English. This has great significance in building systems to help Vietnamese people during their English acquisition, specifically to improve their correct pronunciation of English. The experiment results on the L2-ARCTIC dataset have shown that both models achieve state-of-the-art performance. In addition, we also found that the LSTM model outperforms the CNN model by 6.3% in terms of accuracy due to the memory mechanism at each unit. The source code of our approach can be found at https://github.com/vdquang1991/Mispronounce_Prediction.
Loading