Data Augmentation for Rumor Detection Using Context-Sensitive Neural Language Model With Large-Scale Credibility Corpus

Sooji Han; Jie Gao; Fabio Ciravegna

Data Augmentation for Rumor Detection Using Context-Sensitive Neural Language Model With Large-Scale Credibility Corpus

Sooji Han, Jie Gao, Fabio Ciravegna

Published: 17 Apr 2019, Last Modified: 05 May 2023LLD 2019Readers: Everyone

Keywords: Rumor Detection, Data Augmentation, Social Media, Neural Language Models, Weak Supervision

TL;DR: We propose a methodology of augmenting publicly available data for rumor studies based on samantic relatedness between limited labeled and unlabeled data.

Abstract: In this paper, we address the challenge of limited labeled data and class imbalance problem for machine learning-based rumor detection on social media. We present an offline data augmentation method based on semantic relatedness for rumor detection. To this end, unlabeled social media data is exploited to augment limited labeled data. A context-aware neural language model and a large credibility-focused Twitter corpus are employed to learn effective representations of rumor tweets for semantic relatedness measurement. A language model fine-tuned with the a large domain-specific corpus shows a dramatic improvement on training data augmentation for rumor detection over pretrained language models. We conduct experiments on six different real-world events based on five publicly available data sets and one augmented data set. Our experiments show that the proposed method allows us to generate a larger training data with reasonable quality via weak supervision. We present preliminary results achieved using a state-of-the-art neural network model with augmented data for rumor detection.

3 Replies

Loading