Scaling Vision Transformers for Functional MRI with Flat Maps
Keywords: functional MRI, foundation models, self-supervised learning, neuroAI
TL;DR: We train masked autoencoder vision transformers on videos of fMRI flat maps and show strict power law scaling and promising downstream decoding.
Abstract: This paper proposes cortical flat maps as a data representation for training functional MRI (fMRI) foundation models. fMRI data in native 4D volume space are first projected onto the cortical surface, then flattened and resampled as sequences of 2D images. We train Vision Transformers on 2.3K hours of fMRI flat map ``videos'' from the Human Connectome Project using the spatiotemporal masked autoencoder (MAE) framework. We observe that masked fMRI modeling performance improves with dataset size according to a strict power scaling law. Downstream classification benchmarks show that our model learns rich representations supporting both fine-grained state decoding across subjects, as well as subject-specific trait decoding across changes in brain state. Our code and datasets are available at [links withheld].
Submission Number: 67
Loading