A Study of Mispronunciation Detection and Diagnosis Based on Meta-Learning

Yukai Wan, Yuqi Shi, Binghuai Lin, Yanlu Xie

Published: 2024, Last Modified: 30 Sept 2024ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The majority of the current mispronunciation detection and diagnosis (MD&D) methods rely on manually annotated data for model training. However, annotating mispronunciations produced by second language (L2) learners is costly. Consequently, data scarcity emerges as a significant challenge in MD&D tasks. In this paper, we employ model-agnostic meta-learning (MAML) to train a phoneme recognition model for MD&D. We conduct experiments using varied meta-learning task partitioning and training strategies to endow the model’s ability to rapidly adapt to unfamiliar speakers. Our best-performing method achieves an F-measure of 61.45%, surpassing both the method using fine-tuned pre-trained model wav2vec2.0 and the approach of incorporating reference text during training. These related works also aim to address the challenge of data scarcity in MD&D. Notably, with few-shot fine-tuning, our model still yielded some remarkable results on F-measure, which suggest that in MD&D tasks, meta-learning is indeed effective.