OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens

ICLR 2026 Conference Submission23059 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Scaling laws, Multimodal Transformers, Foundation models, Visual cortex, Neural encoding models, Neural decoding, Behavioral prediction, Calcium imaging
TL;DR: Using an unprecedented dataset of over three million single-neuron recordings, we demonstrate a clear power-law relationship between transformer model scale and performance on neural encoding and decoding tasks.
Abstract: Scaling data and artificial neural networks has transformed AI, driving breakthroughs in language and vision. Whether similar principles apply to modeling brain activity remains unclear. Here we leveraged a dataset of 3.3 million neurons from the visual cortex of 78 mice across 323 sessions, totaling more than 150 billion neural tokens recorded during natural movies, images and parametric stimuli, and behavior. We train multi-modal, multi-task transformer models (1M–300M parameters) that support three regimes flexibly at test time: neural prediction (predicting neuronal responses from sensory input and behavior), behavioral decoding (predicting behavior from neural activity), neural forecasting (predicting future activity from current neural dynamics), or any combination of the three. We find that performance scales reliably with more data, but gains from increasing model size saturate -- suggesting that current brain models are limited by data rather than compute. This inverts the standard AI scaling story: in language and computer vision, massive datasets make parameter scaling the primary driver of progress, whereas in brain modeling -- even in the mouse visual cortex, a relatively simple and low-resolution system -- models remain data-limited despite vast recordings. These findings highlight the need for richer stimuli, tasks, and larger-scale recordings to build brain foundation models. The observation of systematic scaling raises the possibility of phase transitions in neural modeling, where larger and richer datasets might unlock qualitatively new capabilities, paralleling the emergent properties seen in large language models.
Primary Area: applications to neuroscience & cognitive science
Submission Number: 23059
Loading