Abstract: We explore the possibility of using word-level transcription to detect non-native English speaker (NNES)’s phoneme mispronunciation tendencies. We focus on word-level instead of phoneme-level transcription as the former is readily accessible and mature. We define phoneme mispronunciation tendency as the recurring imperfect pronunciation of a phoneme across different words. We use an Automatic Speech Recognition (ASR) service to generate alternative transcripts from speaker’s reading aloud audio data. We build features based on the divergence of the audio transcriptions and the texts, as well as the confidence of the audio transcriptions. We found the features are informative for detecting phoneme mispronunciation tendencies.
0 Replies
Loading