Keywords: neural foundation model, pretraining, scaling law, self-supervised learning, neural encoding, neural dynamics, brain-computer interfaces
TL;DR: We show that the benefits of pretraining multi-session neural data transformers are highly sensitive to data and session selection.
Abstract: A key challenge in analyzing neuroscience datasets is the profound variability they exhibit across sessions, animals, and data modalities—i.e., heterogeneity. Several recent studies have demonstrated performance gains from pretraining neural foundation models on multi-session datasets, seemingly overcoming this challenge. However, these studies typically lack fine-grained data scaling analyses. It remains unclear how different sources of heterogeneity influence model performance as the amount of pretraining data increases, and whether all sessions contribute equally to downstream performance gains. In this work, we systematically investigate how data heterogeneity impacts the scaling behavior of neural data transformers (NDTs) in neural activity prediction. We found that explicit sources of heterogeneity, such as brain region mismatches among sessions, reduced scaling benefits of neuron-level and region-level activity prediction performances. For tasks that do exhibit consistent scaling, we identified implicit data heterogeneity arising from cross-session variability. Through our proposed session-selection procedure, models pretrained on as few as five selected sessions outperformed those pretrained on the entire dataset of 84 sessions. Our findings challenge the direct applicability of traditional scaling laws to neural data and suggest that prior reports of multi-session scaling benefits may need to be re-examined in the light of data heterogeneity. This work both highlights the importance of incremental data scaling analyses and suggests new avenues toward optimally selecting pretraining data when developing foundation models on large-scale neuroscience datasets.
Submission Number: 8
Loading