Weibo-COV: A Large-Scale COVID-19 Social Media Dataset from WeiboDownload PDF

Aug 19, 2020 (edited Oct 10, 2020)EMNLP 2020 Workshop NLP-COVID SubmissionReaders: Everyone
  • Keywords: Large-scale Social Media Dataset, COVID-19, Chinese
  • TL;DR: a large-scale COVID-19 social media dataset built by a novel method
  • Abstract: With the rapid development of COVID-19 around the world, people are requested to maintain “social distance” and “stay at home”. In this scenario, extensive social interactions transfer to cyberspace, especially on social media platforms like Twitter and Sina Weibo. People generate posts to share information, express opinions and seek help during the pandemic outbreak, and these kinds of data on social media are valuable for studies to prevent COVID-19 transmissions, such as early warning and outbreaks detection. Therefore, in this paper, we release a novel and fine-grained large-scale COVID-19 social media dataset collected from Sina Weibo, named Weibo-COV, contains more than 40 million posts ranging from December 1, 2019 to April 30, 2020. Moreover, this dataset includes comprehensive information nuggets like post-level information, interactive information, location information, and repost network. We hope this dataset can promote studies of COVID-19 from multiple perspectives and enable better and rapid researches to suppress the spread of this pandemic.
7 Replies