Keywords: fMRI, foundation model, Hiera, JEPA, Brain
TL;DR: With ~3% of pretraining data, we build a voxel-level fMRI foundation model that sets new state-of-the-art results.
Abstract: Foundation models are emerging as a powerful paradigm for fMRI analysis, but current approaches face a dual bottleneck of data- and training-efficiency. Atlas-based methods aggregate voxel signals into fixed regions of interest, reducing data dimensionality but discarding find-grained spatial details, and requiring extremely large cohorts to train effectively as general-purpose foundation models. Atlas-free methods, on the other hand, operate directly on voxel-level information - preserving spatial fidelity but are prohibitively memory- and compute-intensive, making large-scale pre-training infeasible.
We introduce **SLIM-Brain** (**S**ample-efficient, **L**ow-memory fMR**I** Foundation **M**odel for Human **Brain**), a new atlas-free foundation model that simultaneously improves both data- and training-efficiency. SLIM-Brain adopts a two-stage adaptive design: (i) a lightweight temporal extractor captures global context across full sequences and ranks data windows by saliency, and (ii) a 4D hierarchical encoder (Hiera-JEPA) learns fine-grained voxel-level representations only from the top-k selected windows, while deleting about 70\% masked patches.
Extensive experiments across seven public benchmarks show that SLIM-Brain establishes new state-of-the-art performance on diverse tasks, while requiring only 4 thousand pre-training sessions and approximately 30\% of GPU memory comparing to traditional voxel-level methods. Code and trained weights of SLIM-Brain are available at [https://anonymous.4open.science/r/SLIM-Brain-9C51](https://anonymous.4open.science/r/SLIM-Brain-9C51).
Primary Area: applications to neuroscience & cognitive science
Submission Number: 5079
Loading