Artificial Neural Networks Generate Human-like Continuous Speech Perception

Gasser Elbanna; Josh Mcdermott

Artificial Neural Networks Generate Human-like Continuous Speech Perception

Gasser Elbanna, Josh Mcdermott

Published: 10 Oct 2024, Last Modified: 04 Nov 2024UniRepsEveryoneRevisionsBibTeXCC BY 4.0

Track: Extended Abstract Track

Keywords: Speech Perception, Artificial Neural Networks, Phoneme Recognition, Behavioral Experiment, NeuroAI

Abstract: Humans have a remarkable ability to convert acoustic signals into linguistic representations. To advance toward the goal of building biologically plausible models that replicate this process, we developed an artificial neural network trained to generate sequences of American English phonemes from audio processed by a simulated cochlea. We trained the model with phoneme transcriptions inferred from text annotations of speech corpora. To compare the model to humans, we ran a behavioral experiment in which humans transcribed non-words, and evaluated the model on the same stimuli. While humans slightly outperformed the model, the model exhibited human-like patterns of phoneme confusions for consonants (r=0.91) and vowels (r=0.87). Additionally, the recognizability of individual phonemes was highly correlated (r=0.93) between humans and the model. These results suggest that human-like speech perception emerges from optimizing for phoneme recognition from cochlear representations.

Submission Number: 46

Loading