Stay on Topic, Please: Aligning User Comments to the Content of a News Article

Jumanah Alshehri, Marija Stanojevic, Eduard C. Dragut, Zoran Obradovic

2021 (modified: 07 Nov 2021)ECIR (1) 2021Readers: Everyone

Abstract: Social scientists have shown that up to $$50\%$$ of the comments posted to a news article have no relation to its journalistic content. In this study we propose a classification algorithm to categorize user comments posted to a news article based on their alignment to its content. The alignment seeks to match user comments to an article based on similarity of content, entities in discussion, and topics. We propose a BERTAC, BERT-based approach that learns jointly article-comment embeddings and infers the relevance class of comments. We introduce an ordinal classification loss that penalizes the difference between the predicted and true labels. We conduct a thorough study to show influence of the proposed loss on the learning process. The results on five representative news outlets show that our approach can learn the comment class with up to $$36\%$$ average accuracy improvement comparing to the baselines, and up to $$25\%$$ comparing to the BA-BC. BA-BC is our approach that consists of two models aimed to capture dis-jointly the formal language of news articles and the informal language of comments. We also conduct a user study to evaluate human labeling performance to understand the difficulty of the classification task. The user agreement on comment-article alignment is “moderate” per Krippendorff’s alpha score, which suggests that the classification task is difficult.

0 Replies