Comparative Analysis of Acoustic Perception Models in Simulation of Teacher-Learner Interaction in L2 Pronunciation Learning

ACL ARR 2024 June Submission679 Authors

12 Jun 2024 (modified: 19 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This study presents a comparative analysis of acoustic perception models in simulating teacher-learner interaction for second language (L2) English pronunciation learning, focusing on Chinese native speakers. Three acoustic perception models are evaluated: an English model (M1) based on the XLS-R framework and fine-tuned on the TIMIT corpus, a non-native model (M2) also based on XLS-R but fine-tuned on the L2-ARCTIC corpus, and a Chinese model (M3) using a sequence-to-sequence architecture with connectionist temporal classification (CTC) fine-tuned on the AISHELL-1 corpus. A corpus of seven pseudo-words designed to challenge Chinese learners of English is used to assess the models' performance in capturing the acoustic perception of L2 learners. The Levenshtein distance between recognised sequences and reference sequences for Chinese and English speakers is employed as an evaluation metric, along with the ratio of these distances. Results show that the non-native model (M2) outperforms the English (M1) and Chinese (M3) models in minimising the Levenshtein distance for Chinese speakers and achieves the lowest ratio, indicating its effectiveness in modelling the acoustic perception of L2 learners. These findings suggest that incorporating non-native speech data in acoustic perception models can improve the simulation of teacher-learner interaction in L2 pronunciation learning.
Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: Acoustic Perception Models, L2 Learning
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 679
Loading