Combination Generalization of Capability-Specific Neurons in LLMs

Combination Generalization of Capability-Specific Neurons in LLMs

ACL ARR 2025 February Submission7508 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Although large-scale language models (LLMs) exhibit various exciting capabilities, understanding the mechanisms behind these abilities remains a challenging problem. In this paper, we aim to understand these mechanisms from the perspective of neurons. Specifically, we first propose a Detecting Capability-Specific Neurons (DCSN) method. Extensive enhancement and erasure experiments demonstrate that the detected neurons are highly correlated with these capabilities, exhibiting strong cohesion and separability, which we define as capability-specific neurons. Moreover, leveraging these neurons, we conducted compositional experiments and, for the first time, discovered that capability neurons exhibit compositional generalization. Inspired by these findings, we propose a Capability Neuron-Level Fine-tuning method (CNLF) that fine-tunes specific capability neurons to achieve performance improvements across datasets and tasks. Extensive experiments validate the effectiveness of this method and provide a low-cost, highly generalizable fine-tuning paradigm. Our research offers interpretable insights into the capability mechanisms of LLMs.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Combination Generalization, Capability-Specific Neurons, LLMs

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English, Chinese

Submission Number: 7508

Loading