HMamba: Towards Multifaceted Computer-assisted Pronunciation Training Leveraging Hierarchical Selective State Space Model and Decoupled Cross-entropy Loss

HMamba: Towards Multifaceted Computer-assisted Pronunciation Training Leveraging Hierarchical Selective State Space Model and Decoupled Cross-entropy Loss

ACL ARR 2024 June Submission1867 Authors

15 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Prior efforts in building computer-assisted pronunciation training (CAPT) systems often treat automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD) as separate fronts. APA aims to provide multiple pronunciation aspect scores across diverse linguistic levels, while MDD focuses instead on pinpointing the precise phonetic errors made by non-native language learners. However, a full-fledged CAPT system should integrate both features simultaneously. To address this pressing need, we in this work first propose HMamba, a novel hierarchical selective state space method that jointly tackles APA and MDD tasks. In addition, to enhance model performance, we introduce a novel loss function, decoupled cross-entropy loss (deXent), specifically tailored for the MDD task to facilitate better supervised label learning. A comprehensive set of empirical results carried out on the speechocean762 benchmark dataset demonstrate the effectiveness of our approach in multi-aspect multi-granular assessments. Furthermore, our proposed approach also yields considerable improvement in MDD performance over a competitive baseline, achieving an F1-score of 63.32%.

Paper Type: Long

Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Research Area Keywords: multi-task learning, self-supervised learning, optimization methods, automatic speech recognition, educational applications, speech technologies

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English, Mandarin

Submission Number: 1867

Loading