The Grammar of a Polysemantic Neuron: Understanding How Neurons Compress Multiple Concepts

ICLR 2026 Conference Submission13451 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Interpretability, Explainability, Computer Vision
Abstract: One of the pivotal recent challenges in the field of neural network interpretability is polysemanticity, where a single neuron is activated by multiple, often unrelated concepts. This phenomenon obstructs a straightforward functional understanding of individual neurons. Despite its significance, polysemanticity has not yet been examined in a systematic and comprehensive manner. In this paper, we provide the first in-depth analysis of how polysemanticity emerges across different architectures and layers. Our key contributions are as follows: (1) we introduce effective methods to disentangle visual concept clusters encoded within a single neuron across diverse model architectures; and (2) using this approach, we conduct the systematic investigation of polysemanticity, spanning from its properties across models to the pathways underlying its formation. We believe that this work underscores the necessity of shifting the unit of analysis from individual neurons to the concept clusters they encode.
Supplementary Material: pdf
Primary Area: interpretability and explainable AI
Submission Number: 13451
Loading