Track: long paper (up to 8 pages)
Keywords: Magnetoencephalography (MEG), Autoregressive model, Brain foundation models, Vector-quantized tokenization, sequence modeling, multimodal foundation models, token-stream
TL;DR: We introduce a long-context autoregressive brain-token model for source-space MEG that can generate minutes of realistic, conditionally specific neural activity from minutes of context across multiple datasets.
Abstract: We present a large autoregressive model for source-space MEG that extends token-stream "next-X" prediction to brain activity, positioning MEG as an additional modality for multimodal foundation models.
We scale next-brain-token prediction to long context across datasets and scanners, handling a corpus of over 500 hours and thousands of sessions across the three largest MEG datasets.
A modified SEANet-style vector-quantizer reduces multichannel MEG into a flattened token stream on which we train a Qwen2.5-VL backbone from scratch to predict the next brain token and to recursively generate minutes of MEG from up to a minute of context.
To evaluate long-horizon generation, we introduce task-matched stress tests for (i) on-manifold stability via generated-only drift compared to the time-resolved distribution of real sliding windows, and (ii) conditional specificity via correct context versus prompt-swap controls using a neurophysiologically grounded metric set.
We train on CamCAN and Omega and run all analyses on held-out MOUS, establishing cross-dataset generalization.
Across metrics, generations remain relatively stable over long rollouts and are closer to the correct continuation than swapped controls.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 12
Loading