The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning

Dulhan Jayalath; Gilad Landau; Brendan Shillingford; Mark Woolrich; Oiwi Parker Jones

The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning

Dulhan Jayalath, Gilad Landau, Brendan Shillingford, Mark Woolrich, Oiwi Parker Jones

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose neuroscience-inspired self-supervised objectives that enable learning from heterogeneous and unlabelled neural recordings, unlocking the potential for training speech decoding models with significantly more existing MEG data.

Abstract: The past few years have seen remarkable progress in the decoding of speech from brain activity, primarily driven by large single-subject datasets. However, due to individual variation, such as anatomy, and differences in task design and scanning hardware, leveraging data across subjects and datasets remains challenging. In turn, the field has not benefited from the growing number of open neural data repositories to exploit large-scale deep learning. To address this, we develop neuroscience-informed self-supervised objectives, together with an architecture, for learning from heterogeneous brain recordings. Scaling to nearly **400 hours** of MEG data and **900 subjects**, our approach shows generalisation across participants, datasets, tasks, and even to *novel* subjects. It achieves **improvements of 15-27%** over state-of-the-art models and **matches *surgical* decoding performance with *non-invasive* data**. These advances unlock the potential for scaling speech decoding models beyond the current frontier.

Lay Summary: Although more and more data from participants recorded with safe brain imaging devices are appearing publicly on the internet, we do not have methods that can effectively train AI on all of this data together because each data source uses different methods to collect their data and provide different labels. We built a method that resolves the differences between these data sources and is able to work without labels, allowing all of the brain imaging data on the internet to be used to train AI. In turn, this unblocks the path to building better AI-driven brain-computer interfaces by collecting and combining even more data. We hope to eventually use these methods to help paralysed patients communicate through AI that deciphers their intended speech from their brain signals.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/neural-processing-lab/the-brains-bitter-lesson

Primary Area: Applications->Neuroscience, Cognitive Science

Keywords: neural decoding, speech decoding, neuroscience

Submission Number: 10851

Loading