Abstract: Detecting cite-worthiness in text is seen as the problem of flagging a missing reference to a scientific result (an article or a dataset) that should come to support a claim formulated in the text. Previous work has taken interest in this problem in the context of scientific literature, motivated by the need to allow for reference recommendation for researchers and flag missing citations in scientific work. In this preliminary study, we extend this idea towards the context of social media. As scientific claims are often made to support various arguments in societal debates on the Web, it is crucial to flag non-referenced or unsupported claims that relate to science, as this promises to contribute to improving the quality of the debates online. We experiment with baseline models, initially tested on scientific literature, by applying them on the SciTweets dataset which gathers science-related claims from X. We show that models trained on scientific papers struggle to detect cite-worthy text from X, we discuss implications of such results and argue for the necessity to train models on social media corpora for satisfactory flagging of missing references on social media. We make our data publicly available to encourage further research on cite-worthiness detection on social media.
External IDs:dblp:conf/nslp/HafidABT24
Loading