Automatic detection of voice onset time contrasts for use in pronunciation assessment

Abe Kazemzadeh

Published: 01 Dec 2006, Last Modified: 17 Dec 2024InterspeechEveryoneCC BY-SA 4.0

Abstract: This study examines methods for recognizing different classes of phones from accented speech based on voice onset time (VOT). These methods are tested on data from the Tball corpus of Los Angeles-area elementary school children [1]. The methods proposed and tested are: 1) to train models based on standard English VOT contrasts and then extract the VOT characteristics of the phones by measuring the duration of phone-level and sub-phone-level alignments, 2) to train phone models with explicit aspiration, and 3) to train different models for different phoneme classes of VOT times. Error rates of 23-53% for different phone classes are reported for the first method, 5-57% for the second method, and 0-36% for the third. The results show that different methods work better on different phone classes. We interpret these results in relation to past research on VOT, explain possible uses for these findings, and propose directions for future research.