Keywords: symptoms, twitter, unsupervised graph-based lexical expansion, social media, covid-19, bert
TL;DR: We present an unsupervised graph-based method for finding context-specific words and texts (e.g. symptoms) in large imbalanced corpora (e.g. tweets with #COVID-19), useful for syndromic surveillance of diseases whose pathology is evolving.
Abstract: In this paper, we present an iterative graph-based approach for the detection of symptoms of COVID-19, the pathology of which seems to be evolving. More generally, the method can be applied to finding context-specific words and texts (e.g. symptom mentions) in large imbalanced corpora (e.g. all tweets mentioning \#COVID-19). Given the novelty of COVID-19, we also test if the proposed approach generalizes to the problem of detecting Adverse Drug Reaction (ADR). We find that the approach applied to Twitter data can detect symptom mentions substantially before to their being reported by the Centers for Disease Control (CDC).