Efficient Modeling of Long-range fMRI Dynamics with a 2D Natural Image Autoencoder

15 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: fMRI, 4D, neuroscience, autoencoder
Abstract: Modeling long-range spatiotemporal dynamics in functional Magnetic Resonance Imaging (fMRI) remains a key challenge due to the high dimensionality of the four-dimensional signals. Prior voxel-based models, although demonstrating excellent performance and interpretation capabilities, are constrained by prohibitive memory demands and thus can only capture limited temporal windows. To address this, we propose TABLeT (Two-dimensionally Autoencoded Brain Latent Transformer), a novel approach that tokenizes fMRI volumes using a pre-trained 2D natural image autoencoder. Each 3D fMRI volume is compressed into a compact set of continuous tokens, enabling efficient long-sequence modeling with a simple Transformer encoder. Across large-scale benchmarks including the UK-Biobank (UKB), Human Connectome Project (HCP), and ADHD-200 datasets, TABLeT outperforms existing models in multiple tasks, while demonstrating substantial gains in computational and memory efficiency over the state-of-the-art voxel-based method. Furthermore, we demonstrate that TABLeT can be pre-trained with a self-supervised masked token modeling approach, improving downstream tasks' performance. Our findings suggest a promising approach for scalable spatiotemporal modeling of brain activity.
Primary Area: applications to neuroscience & cognitive science
Submission Number: 5477
Loading