Brain-Predictive Reasoning Embedding through Residual Disentanglement

Published: 23 Sept 2025, Last Modified: 29 Oct 2025NeurReps 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Disentangled Representations, Encoding Models, Reasoning, Language
TL;DR: Our work disentangles lexical, syntactic, semantic, and reasoning representations in large language models through residualization, and demonstrates their distinct neural alignment with human brain activity during natural language comprehension.
Abstract: Conventional brain encoding analysis using language models that feeds whole hidden states can be biased toward shallow lexical cues. Here we present a residual-layer disentangling method that extracts four nearly orthogonal vectors from a language model, respectively containing information corresponding to lexicon, syntax, meaning, and reasoning. We first probe the model to locate the layers where each linguistic feature is maximal, then strip lower-level feature layer-by-layer. Applying bootstrap-ridge encoding to natural-speech ECoG yields three insights: 1) Our residual pipeline isolates a reasoning embedding with unique predictive value, possible only because the latest large language models exhibit emergent reasoning behavior. 2) Apparent high-level predictive performance in conventional analyses is largely attributable to recycled shallow information, rather than genuine deep processing. 3) The reasoning embedding reveals distinct spatiotemporal brain activation patterns, including recruitment of frontal and visual regions beyond classical language areas, suggesting a potential neural substrate for high-level reasoning. Together, our approach removes shallow bias, aligns distinct transformer strata with brain hierarchies, and provides the first brain-relevant representation of reasoning.
Submission Number: 148
Loading