Understanding stance classification of BERT models: an attention-based framework

Carlos Abel Córdova Sáenz, Karin Becker

Published: 2024, Last Modified: 01 Mar 2024Knowl. Inf. Syst. 2024Readers: Everyone

Abstract: BERT produces state-of-the-art solutions for many natural language processing tasks at the cost of interpretability. As works discuss the value of BERT’s attention weights to this purpose, we contribute to the field by examining this issue in the context of stance classification. We propose an interpretability framework to identify the most influential words for correctly predicting stances using BERT models. Unlike related work, we develop a broader level of interpretability focused on the overall model behaviour, aggregating tokens’ attentions into words’ attention weights that can be semantically related to the domain and proposing metrics to measure words relevance in correct predictions. We developed a broad experimental setting to analyse the premises underlying our framework regarding word attention scores and the capability concerning interpretability, adopting three case studies of stances expressed on Twitter on issues about the pandemic, and four pre-trained BERT models. We concluded that our method is not affected by the characteristics of BERT-models vocabularies, that words with high absolute attention have a higher probability of positive influence on correct classification, and that the influential words represent the domains. We observed many common words compared to a baseline method, but the words yielded by our method were considered more relevant according to a qualitative assessment.

0 Replies