A Text Classification Approach for the Automatic Detection of Twitter Posts Containing Self-reported COVID-19 Symptoms
Keywords: Text Classification, BERT based Model, Social Media, Twitter, COVID-19, Machine learning, Natural language processing
TL;DR: NLP based framework using Twitter data for the automatic detection of tweets containing self-reported symptoms by COVID-19-positive patients
Abstract: Social media presents a potentially useful resource for conducting automated surveillance of the spread of COVID-19. As part of our efforts to establish a social media based surveillance system, our objective in this paper is to describe the development and evaluation of a natural language processing and machine learning framework using Twitter data for the automatic detection of tweets containing self-reported symptoms by COVID-19-positive patients. We modeled the tweet-level symptom detection as a binary classification task. We used annotated data from a past study, which includes posts from users who had self-identified to have tested positive for COVID19, and discussed their symptoms over multiple tweets. We trained a BERT-based classifier to automatically detect tweets mentioning COVID-19-related symptoms posted by the users (positive tweets) and those that do not (negative tweets). The F1-score performance of the Twitter COVID-19 symptom classifier was evaluated using the F1score metric over a held-out test set. The classifier achieved an F1score of 0.71 and 0.96 for positive and negative classes, respectively. Following the training and evaluation of the classification approach, we ran it on unlabeled data from December 2019 to early February 2020, and qualitatively analyzed the classified tweets to examine the effectiveness of the classifier.
0 Replies
Loading