Abstract: Recently, Conformer-based model shows promising results in automatic speech recognition (ASR) task. There still is a dearth of research on Conformer based model for computer-assisted pronunciation learning (CAPT) system. In this paper, a Conformer-based CAPT system is introduced to provide the mispronunciation detection and diagnosis. We apply the Conformer as the main pronunciation error detection model in phoneme level since superior phoneme recognition performance. Then, the features, including the Log Phone Posterior (LPP), the Log Posterior Ratio (LPR) and some other features, extracted from the Conformer decoder, are trained by a XGBoost model to predict phoneme and sentence level scores labeled by experts. Both results on open datasets and our internal Chinese children data demonstrate that the Conformer-based system, which has smaller model size and detailed diagnosis, achieves better performance compared with neutral network (NN)-based system.
0 Replies
Loading