Abstract: The COVID-19 pandemic had a profound global impact, necessitating a comprehensive understanding of public sentiment and reactions.
Though there exist many public datasets about COVID-19, which advance in high volumes even reaching 100 billion, they suffer from the availability of labeled data or the coarse-grained sentiment labels. In this paper, we introduce FineCOVIDSen, a novel fine-grained sentiment analysis dataset tailored for COVID-19 tweets. It contains fine-grained ten categories varying in five different languages where each piece of data may contain more than one label. The dataset includes 10,000 annotated English tweets and 10,000 annotated Arabic tweets as well as 30, 000 translated Spanish, French, and Italian tweets from English tweets. Also, it comprises more than 105 million unlabeled tweets collected from March 1 to May 15, 2020. To support accurate fine-grained sentiment classification, we fine-tuned the pre-trained transformer-based language models on the labeled tweets. Beyond those, our study provides detailed analysis and unveils intriguing insights into the evolving emotional landscape over time in different languages, countries, and topics as well as a case study on the predicted results for unlabeled data. Our dataset and code are publicly available at anonymous GitHub \footnote{https://anonymous.4open.science/r/FineCovidSen-5F96}. We also evaluate the availability of our dataset using ChatGPT. Our hope is that this work will promote more fine-grand sentiment analysis on complex events for the NLP community.
Paper Type: long
Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining
Contribution Types: Data resources, Data analysis
Languages Studied: English, Arabic, Spanish, French, Italian
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading