Mapping Whisper Representations to Human ECoG Responses with Interpretable Time-Resolved Neural Encoding
Track: long paper (up to 10 pages)
Domain: cognitive science
Abstract: Understanding how hierarchical speech representations map onto human cortical activity is a central challenge in computational neuroscience. In this work, we study how internal representations from Whisper, a large-scale speech recognition model, predict intracranial electrophysiological (ECoG) responses during naturalistic speech perception. We introduce a time-resolved neural encoding model that aligns Whisper embeddings to word-locked cortical responses using a recurrent architecture with soft temporal attention. Our results show that intermediate Whisper layers consistently provide the best predictions of neural activity, revealing a correspondence between model hierarchy and cortical speech processing. In addition, a phonemic interpretability analysis uncovers anatomically coherent, phoneme-selective clusters in superior temporal cortex, providing converging evidence that intermediate speech model representations capture neural computations underlying human speech perception.
Presenter: ~Matteo_Ferrante1
Submission Number: 35
Loading