What Do Neural Speech Models Know About Phonology? Evidence from Structured Phoneme Confusions

What Do Neural Speech Models Know About Phonology? Evidence from Structured Phoneme Confusions

ACL ARR 2026 January Submission5635 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: phoneme recognition, speech recognition errors, phonological features, feature-based error, analysis self-supervised speech models, cross-linguistic analysis, phoneme confusions, interpretability of ASR

Abstract: ASR errors are typically analyzed at the phoneme level, treating phonemes as atomic symbols. In this work, we instead adopt a featural representation of phonemes, grounded in phonological theory, which models speech sounds as structured bundles of distinctive articulatory and acoustic properties. This perspective allows us to analyze recognition errors at a finer granularity and to investigate whether certain phonological features are more vulnerable than others. Across multiple languages, we show that phoneme confusions are strongly structured in phonological feature space: errors are predominantly local and exhibit systematic asymmetries that reveal a small set of weakly modeled features. These findings have direct implications both for the design and diagnosis of ASR systems and for cognitive models of human speech perception, where similar feature-level asymmetries have long been observed.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: phoneme recognition, speech recognition errors, phonological features, feature-based error analysis, self-supervised speech models, cross-linguistic analysis, phoneme confusions, interpretability of ASR

Contribution Types: Model analysis & interpretability

Languages Studied: Dutch, English, Finnish, French, Indonesian, Italian, Maltese, Polish, Swedish, Tamil, Thulung, Turkish

Submission Number: 5635

Loading