Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models
Abstract: Knowledge neuron theory provides a key approach to understanding the mechanisms of factual knowledge in Large Language Models (LLMs), which suggests that facts are stored within multi-layer perceptron neurons. This paper further explores **Degenerate Knowledge Neurons** (DKNs), where distinct sets of neurons can store identical facts, but unlike simple redundancy, they also participate in storing other different facts.
Despite the novelty and unique properties of this concept, it has not been rigorously defined and systematically studied.
Our contributions are: (1) We pioneer the study of structures in knowledge neurons by analyzing weight connection patterns, providing a comprehensive definition of DKNs from both functional and structural aspects. (2) Based on this definition, we develop the **Neuronal Topology Clustering** method, leading to a more accurate DKN identification. (3) We demonstrate the practical applications of DKNs in two aspects: guiding LLMs to learn new knowledge and relating to LLMs' robustness against input errors.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: knowledge tracing/discovering/inducing
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 222
Loading