Keywords: sentiment analysis; interpretability; representation learning
TL;DR: Modeling sentiment in LLMs through usage factors improves reliability and interpretability.
Abstract: Large language models (LLMs) can encode high-level concepts as linear directions in their representation space, and sentiment has been studied in this framework. However, probe-derived sentiment directions often vary substantially across datasets, thereby compromising reliability for downstream applications. Prior work addresses this issue with distributional methods such as Gaussian subspaces, which improve reliability but trade off direct interpretability of linguistic meaning. In this paper, we propose a usage-aware sentiment representation framework that grounds sentiment variability in linguistic usage factors such as tone, topic, context, and genre, which are drawn from linguistic research.
Our framework operates at two complementary levels of analysis: At the axis level, we construct sentiment directions from both pooled and usage-specific data to investigate the role of usage in shaping sentiment representations. At the neuron level, we provide a finer view by distinguishing usage-invariant neurons that consistently encode sentiment from usage-sensitive neurons whose contributions vary across usages.
Experiments indicate that usage-aware sentiment representation enhances reliability, improving both classification accuracy and controllability of sentiment steering. Finally, preliminary experiments with audio LLMs suggest that our framework generalizes beyond text, pointing toward cross-modal applicability.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 9941
Loading