Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs

Jiatong Han; Jannik Kossen; Muhammed Razzak; Yarin Gal

Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs

Jiatong Han, Jannik Kossen, Muhammed Razzak, Yarin Gal

Published: 09 Oct 2024, Last Modified: 15 Dec 2024MINT@NeurIPS2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: uncertainty estimation; large language models; linear probing; interpretability

Abstract: Uncertainty estimation in Large Language Models (LLMs) is challenging because token-level uncertainty includes uncertainty over lexical and syntactical variations, and thus fails to accurately capture uncertainty over the semantic meaning of the generation. To address this, Farquhar et al. have recently introduced semantic uncertainty (SE), which quantifies uncertainty in the semantic meaning by aggregating token-level probabilities of generations if they are semantically equivalent. Kossen et al. further demonstrated that SE can be cheaply and reliably captured using linear probes on the model hidden states. In this work, we build on these results and show that semantic uncertainty in LLMs can be predicted from only a very small set of neurons. We find these neurons by training linear probes with $L_1$ regularization. Our approach matches the performance of full-neuron probes in predicting SE. An intervention study further shows these neurons causally affect the semantic uncertainty of model generations. Our findings reveal how hidden-state neurons encode semantic uncertainty, present a method to manipulate this uncertainty, and contribute insights for the field of interpretability research.

Email Of Author Nominated As Reviewer: jiatong.han@u.nus.edu

Submission Number: 7

Loading