Keywords: phoneme recognition, speech recognition errors, phonological features, feature-based error, analysis self-supervised speech models, cross-linguistic analysis, phoneme confusions, interpretability of ASR
Abstract: ASR errors are typically analyzed at the phoneme level, treating phonemes as atomic symbols. In this work, we instead adopt a featural representation of phonemes, grounded in phonological theory, which models speech sounds as structured bundles of distinctive articulatory and acoustic properties. This perspective allows us to analyze recognition errors at a finer granularity and to investigate whether certain phonological features are more vulnerable than others. Across multiple languages, we show that phoneme confusions are strongly structured in phonological feature space: errors are predominantly local and exhibit systematic asymmetries that reveal a small set of weakly modeled features. These findings have direct implications both for the design and diagnosis of ASR systems and for cognitive models of human speech perception, where similar feature-level asymmetries have long been observed.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: phoneme recognition, speech recognition errors, phonological features, feature-based error analysis, self-supervised speech models, cross-linguistic analysis, phoneme confusions, interpretability of ASR
Contribution Types: Model analysis & interpretability
Languages Studied: Dutch, English, Finnish, French, Indonesian, Italian, Maltese, Polish, Swedish, Tamil, Thulung, Turkish
Submission Number: 5635
Loading