What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores

15 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large language models, Neuroscience, Neural encoding, fMRI, Replication, Language
TL;DR: We find that on three widely used neural datasets, much of the neural encoding performance of LLMs is driven by simple features, and we urge researchers to rigorously interpret what LLMs are mapping to in neural signals before forming conclusions.
Abstract: Given the remarkable capabilities of large language models (LLMs), there has been a growing interest in evaluating their similarity to the human brain. One approach towards quantifying this similarity is by measuring how well a model predicts neural signals, also called "brain score". Internal representations from LLMs achieve state-of-the-art brain scores, leading to speculation that they share computational principles with human language processing. This inference is only valid if the subset of neural activity predicted by LLMs reflects core elements of language processing. Here, we question this assumption by analyzing three neural datasets used in an impactful study on LLM-to-brain mappings, with a particular focus on an fMRI dataset where participants read short passages. We first find that when using shuffled train-test splits, as done in previous studies with these datasets, a trivial feature that encodes temporal autocorrelation not only outperforms LLMs but also accounts for the majority of neural variance that LLMs explain. We therefore caution against shuffled train-test splits, and use contiguous test splits moving forward. Second, we explain the surprising result that untrained LLMs have higher-than-expected brain scores by showing they do not account for additional neural variance beyond two simple features: sentence length and sentence position. This undermines evidence used to claim that the transformer architecture biases computations to be more brain-like. Third, we find that brain scores of trained LLMs on this dataset can largely be explained by sentence position, sentence length, and static word vectors; a small, additional amount is explained by sense-specific word embeddings and contextual representations of sentence structure. We conclude that over-reliance on brain scores can lead to over-interpretation of similarity between LLMs and brains, and emphasize the importance of deconstructing what LLMs are mapping to in neural signals.
Supplementary Material: zip
Primary Area: Neuroscience and cognitive science (neural coding, brain-computer interfaces)
Flagged For Ethics Review: true
Submission Number: 21527
Loading