Rethinking Relation-Specific Neurons in Large Language Models

Lea Hirlimann; Sebastian Gerstner; François Yvon; Hinrich Schuetze

Rethinking Relation-Specific Neurons in Large Language Models

Lea Hirlimann, Sebastian Gerstner, François Yvon, Hinrich Schuetze

Published: 11 Jun 2026, Last Modified: 17 Jun 2026Mech Interp Workshop ICML 2026 VirtualposterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Methods (probing, steering, causal interventions)

Other Keywords: MLP activations, semantic relations

TL;DR: This paper examines the way we identify relation-specific neurons in language models using contrastive activations and shows that conclusions about internal representations depend on methodological choices and prior assumptions..

Abstract: Previous work has identified relation-specific neurons that selectively activate on specific semantic relations in factual knowledge tasks. However, the conclusions we draw about these representations depend heavily on the methodological assumptions underlying this procedure. We systematically reflect on three such assumptions, showing that (i) the number of relevant neurons varies across relations; (ii) the choice of internal signal for neuron identification shapes the results; (iii) cross-relation entanglement is structural rather than an artifact of subject overlap. We additionally present a preliminary investigation into the mismatch between benchmark-defined relation categories and model-internal organization. For instance, we show that the absence of a strong expert set for the product_company relationship reflects conceptual heterogeneity within the category rather than localization failure, and that targeted ablation of the subrelation car\_company yields substantially stronger results. Together, our findings show that the apparent structure of relational representations is jointly shaped by the model's internal organization and the methodological lens applied to study it.

Submission Number: 432

Loading