OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens

Konstantin Friedrich Willeke; Polina Turishcheva; Alex Gilbert; Goirik Chakrabarty; Hasan Atakan Bedel; Paul G. Fahey; Yongrong Qiu; Marissa A. Weis; Michaela Vystrčilová; Taliah Muhammad; Lydia Ntanavara; Rachel E Froebe; Kayla Ponder; Zheng Huan Tan; Emin Orhan; Erick Cobos; Sophia Sanborn; Katrin Franke; Fabian H. Sinz; Alexander S. Ecker; Andreas S. Tolias

OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens

Published: 26 Jan 2026, Last Modified: 04 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Scaling laws, Multimodal Transformers, Foundation models, Visual cortex, Neural encoding models, Neural decoding, Behavioral prediction, Calcium imaging

TL;DR: Scaling a unified model of neural activity, video, and behavior across 73 mice reveals that brain modeling is data-limited, not parameter-limited, even at unprecedented recording scale.

Abstract: Scaling data and artificial neural networks has transformed AI, driving breakthroughs in language and vision. Whether similar principles apply to modeling brain activity remains unclear. Here we leveraged a dataset of 3.1 million neurons from the visual cortex of 73 mice across 323 sessions, totaling more than 150 billion neural tokens recorded during natural movies, images and parametric stimuli, and behavior. We train multi-modal, multi-task models that support three regimes flexibly at test time: neural prediction, behavioral decoding, neural forecasting, or any combination of the three. OmniMouse achieves state-of-the-art performance, outperforming specialized baselines across nearly all evaluation regimes. We find that performance scales reliably with more data, but gains from increasing model size saturate. This inverts the standard AI scaling story: in language and computer vision, massive datasets make parameter scaling the primary driver of progress, whereas in brain modeling -- even in the mouse visual cortex, a relatively simple system -- models remain data-limited despite vast recordings. The observation of systematic scaling raises the possibility of phase transitions in neural modeling, where larger and richer datasets might unlock qualitatively new capabilities, paralleling the emergent properties seen in large language models. Code available at \url{https://github.com/enigma-brain/omnimouse}.

Primary Area: applications to neuroscience & cognitive science

Submission Number: 23059

Loading