Reconstructing the cascade of language processing in the brain using the internal computations of a transformer-based language model
Abstract: Piecing together the meaning of a narrative requires understanding both individual words and the intricate relationships between them. How does the brain construct this kind of rich, contextual meaning? Recently, a new class of artificial neural networks—based on the Transformer architecture—has revolutionized the field of language modeling. Transformers integrate information across words via multiple layers of structured circuit computations, forming increasingly contextualized representations of linguistic content. In this paper, we deconstruct these circuit computations and analyze the associated “transformations” (alongside the more commonly studied “embeddings”) to provide a fine-grained window onto linguistic computations in the human brain. Using functional MRI data acquired while participants listened to naturalistic spoken stories, we find that these transformations capture a hierarchy of linguistic computations across cortex, with transformations at later layers in the model mapping onto higher-level language areas in the brain. We then decompose these transformations into individual, functionally-specialized “attention heads” and demonstrate that the emergent syntactic computations performed by individual heads correlate with predictions of brain activity in specific cortical regions. These heads fall along gradients corresponding to different layers, contextual distances, and syntactic dependencies in a low-dimensional cortical space. Our findings indicate that large language models and the cortical language network converge on similar trends of computational specialization for processing natural language.
0 Replies
Loading