Topic Modeling for Tracking COVID-19 Communication on Twitter

Published: 01 Jan 2022, Last Modified: 01 Oct 2024ICIST 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this study, we analyze the trends of COVID-19 related communication in Croatian language on Twitter. First, we prepare a dataset of 147,028 tweets about COVID-19 posted during the first three waves of the pandemic, and then perform an analysis in three steps. In the first step, we train the LDA model and calculate the coherence values of the topics. We identify seven topics and report the ten most frequent words for each topic. In the second step, we analyze the proportion of tweets in each topic and report how these trends change over time. In the third step, we study spreading properties for each topic. The results show that all seven topics are evenly distributed across the three pandemic waves. The topic “vaccination” stands out with the change in percentage from 14.6% tweets in the first wave to 25.7% in the third wave. The obtained results contribute to a better understanding of pandemic communication in social media in Croatia.
Loading