Sentiment and intent classification of in-text citations using BERT

Ruan Visser, Marcel Dunaiski

Published: 22 Jun 2022, Last Modified: 20 Jun 2024Proceedings of 43rd Conference of the South African Institute of Computer Scientists and Information TechnologistsEveryoneCC BY 4.0

Abstract: Methods such as the h-index and the journal impact factor are commonly used by the scientific community to quantify the quality or impact of research output. These methods rely primarily on citation frequency without taking the context of citations into consideration. Furthermore, these methods weigh each citation equally ignoring valuable citation characteristics, such as citation intent and sentiment. The correct classification of citation intents and sentiments can therefore be used to further improve scientometric impact metrics. In this paper we evaluate BERT for intent and sentiment classification of in-text citations of articles contained in the database of the Association for Computing Machinery (ACM) library. We analyse various BERT models which are fine-tuned with appropriately labelled datasets for citation sentiment classification and citation intent classification. Our results show that BERT can be used effectively to classify in-text citations. We also find that shorter citation context ranges can significantly improve their classification. Lastly, we also evaluate these models with a manually annotated test dataset for sentiment classification and find that BERT-cased and SciBERT-cased perform the best.