The Neural Pile: 476 billion tokens of broad-coverage spiking neural activity data

Published: 23 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Workshop BrainBodyFMEveryoneRevisionsBibTeXCC BY 4.0
Keywords: neural foundation models, systems neuroscience, spiking neural activity, computational neuroscience, large-scale distributed training, neural prediction
Abstract: Foundation models pretrained with large-scale and rich domain-specific datasets facilitate scientific discovery and technological advances. Systems neuroscience currently lacks such foundation models, due mainly to two obstacles: (i) a lack of large-scale datasets and (ii) scarcity of large-scale compute to train high-capacity models. Here, we aim to address both challenges. We first introduce the Neural Pile, a large-scale curated dataset of spiking neural activity recorded from both primates and rodents. The dataset contains 34B uncompressed tokens of neural data from primates and 441B uncompressed tokens of neural data from rodents, involving multiple species and covering a wide range of brain regions, behaviors, and tasks. We provide a separate test split that is intended as a challenging neural prediction benchmark for evaluating neural foundation models. Secondly, as a strong baseline on this benchmark, we also release large-scale models (8B parameter models with a context length of 131k tokens) pretrained on the Neural Pile.
Submission Number: 66
Loading