Abstract: Transformer-based Large Language Models (LLMs) are widely used as foundational models for sentiment analysis. However, they have long been criticized for their lack of explainability and transparency. Most existing research focuses on interpreting the salience of input tokens with regard to the model prediction, while the inner workings of LLMs in sentiment analysis remain under-explored. In this work, we attempt to explore the hidden states of Transformer-based LLMs in relation to the sentiment conveyed by input texts. Specifically, we analyze the hidden states outputted by each layer as well as each head in a RoBERTa model finetuned for sentiment analysis, so as to examine the sentiment-related knowledge embedded in them. To achieve this, we apply three different clustering algorithms to probe whether each layer or head encodes sufficient knowledge to distinguish sentiment. Our experiments reveal that text length and frequency affect the tokens of hidden layers, and that not all heads within a layer contribute to the final result, indicating redundancy in parts of the model’s internal structure. Additionally, we conduct a part-of-speech analysis, which suggests that hidden states contain information about part-of-speech tags. We further explore the internal mechanism in RoBERTa by performing experiments on word sense disambiguation and entailment.
Loading