Latent Structure of Affective Representations in Large Language Models

ICLR 2026 Conference Submission13053 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Language Models, Affective Computing, Latent Representations
TL;DR: We show that LLMs encode emotions in coherent geometric structures aligned with human valence–arousal maps, and that these structures can be leveraged for uncertainty quantification.
Abstract: The geometric structure of latent representations in large language models (LLMs) is an active area of research, driven in part by its implications for model transparency and AI safety. Existing literature has focused mainly on general geometric and topological properties of the learnt representations, but due to a lack of ground-truth latent geometry, validating the findings of such approaches is challenging. Emotion processing provides an intriguing testbed for probing representational geometry, as emotions exhibit both categorical organization and continuous affective dimensions, which are well-established in the psychology literature. Moreover, understanding such representations carries safety relevance. In this work, we investigate the latent structure of affective representations in LLMs using geometric data analysis tools. We present three main findings. First, we show that LLMs learn coherent latent representations of affective emotions that align both with widely used valence–arousal models from psychology and patterns observed in human brainwave data. Second, we find that these representations exhibit nonlinear geometric structure that can nonetheless be well-approximated linearly, providing empirical support for the linear representation hypothesis commonly assumed in model transparency methods. Third, we demonstrate that the learned latent representation space can be leveraged to quantify uncertainty in emotion processing tasks. Our results are based on experiments with the GoEmotions corpus, which contains $\sim$58,000 text comments with manually annotated sentiment.
Primary Area: interpretability and explainable AI
Submission Number: 13053
Loading