Crosscoders Identify Shared or Specific Features between the Human Brain and Language Models

Koshiro Aoki; Itsuki Hamada; Naho Orita; Daisuke Kawahara; Hiromu Sakai

Crosscoders Identify Shared or Specific Features between the Human Brain and Language Models

Koshiro Aoki, Itsuki Hamada, Naho Orita, Daisuke Kawahara, Hiromu Sakai

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Mech Interp Workshop ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Concept Discovery (e.g., SAEs, dictionary learning), Interpretability for Knowledge Discovery

Other Keywords: crosscoders, neuroscience

TL;DR: We decompose brain and language model representations into shared and specific features via crosscoders, finding that embodied semantics are brain-specific and colloquial expressions are LM-specific.

Abstract: To what extent do human brains and language models (LMs) share internal representations of language, and how do these representations differ? Prior work has shown that LM representations can predict brain responses to naturalistic language stimuli, suggesting that the two systems encode common information. However, which features are shared between brain and LM representations and which are selectively used in brains and LMs have remained underspecified. We propose Brain-LM crosscoders, which decompose brain responses and LM representations into shared sparse features and label each feature as being shared, brain-specific, or LM-specific based on its predictive contribution to each representation. Experiments on naturalistic language listening fMRI data show that language associated with body, family, and action tends to be brain-specific, whereas colloquial expressions tend to be LM-specific. Brain-LM crosscoders compare biological and artificial language representations at the feature level, which will contribute to scientific discovery in both neuroscience and artificial neural network research.

Submission Number: 644

Loading