\section{Introduction}
\label{sec:intorduction}

The first officially known outbreak of the Covid-19 was initiated in Wuhan, China, at the end of 2019 \citep{site:who2021}. According to the rapid dissemination of Coronavirus and the number of lost lives from this infection, the Covid-19 pandemic has massively impacted our daily lives, interactions, behaviors, and routines. Although upcoming breakouts are potential and the future of the Covid-19 pandemic is uncertain \citep{Bonnevie2021}, currently, there are several vaccines which act as controlling measures for the disease outbreak.

As mentioned by \cite{ThanhLe2020}, controlling factors other than the quality of the vaccines, such as public support and trust towards authorities, are essential to ensure the efficiency of vaccination programs. However, these types of treatments, particularly those offered as an emergency response to a rapidly spreading pandemic, are sometimes looked upon with reservation and reluctance \citep{TROIANO2021245}. Therefore, with respect to the overall aim of global immunization and prevention of social consequences of the pandemic, there exists great potential and need for studies that analyze both supportive and critical viewpoints related to mass vaccination of the population. Understanding critical viewpoints and their rationales is helpful to convince a wider proportion of society into getting vaccinated and increasing the success rate of such programs worldwide.

Nowadays, social media platforms play a significant role in our lives. People communicate, express their feelings and passions, and inform or get informed about the latest news via these platforms. Investigating social media can shed light on measuring people's attitudes toward any discussed topic and recognizing how their opinions evolve over time. In recent years, Twitter has been a key source of information dissemination as one of the most powerful social networks. Each user on Twitter can broadcast a message that may contain any desired content, as long as he/she abides by the platform's safety, privacy, and authenticity rules\footnote{\url{https://help.twitter.com/en/rules-and-policies/twitter-rules}}.

Despite the fact that content on Twitter is publicly accessible, conducting research on tweets requires a detailed plan for acquiring and analyzing relevant data. This paper presents a practical approach for mining and classification of Persian tweets and users regarding Coronavirus vaccination, leading to a detailed analysis of public supportive and critical attitudes on vaccination in Iran. Moreover, our study is focused on Persian, which is a resource-limited language that has received scant levels of attention from social studies compared to English. In addition, this research provides insights into the relationship between different events and social media reactions to them. 
The contribution of this paper can be summarized as follows:
\begin{itemize}
    \item We describe a topic modeling approach combined with a keyword-based method for extracting Persian tweets related to vaccination.
    \item We apply transformer-based machine learning techniques for tweet classification.
    \item We conduct an emotion analysis using the labelled dataset for happiness and anger emotions in Persian words.
    \item We quantify different supportive and critical vaccination themes extracted from tweets.
    \item We investigate users' connections before and after the initiation of vaccination.
\end{itemize}

The remainder of this paper is organized as follows: Section \ref{sec:related-work} gives a brief synopsis of the previous related works. Afterwards, Section \ref{sec:data} explains how Persian tweets relevant to Covid-19 have been collected. In Section \ref{sec:methods}, we present the preprocessing methodologies as well as our approaches for obtaining tweets related to vaccination, and introduce a strategy to classify the tweets into three classes: negative, positive, and neutral. Techniques used for emotion analysis and further evaluations, such as extracting vaccine themes and user study, are also explored in this section. Section \ref{sec:results} analyzes classified tweets and extracted themes. Furthermore, multiple pieces of analysis about the Covid-19 timeline, user groups and influential users, and overall emotion analysis results are included in this section. Finally, Section \ref{sec:conclusion} concludes the paper and outlines future research directions.



\section{Related Work}
\label{sec:related-work}

Considering the diversity, richness, and availability of Twitter data, several pieces of research are conducted utilizing tweets to analyze the impact of Covid-19 on societies and social media platforms. According to Covid-19 Data Explorer \footnote{\url{https://ourworldindata.org/explorers/coronavirus-data-explorer}}, Iran was one of the first countries got infected by Covid-19; nevertheless, there are only a few analyses carried out to investigate Iranians' opinions toward Coronavirus and vaccination. \cite{HOSSEINI2020} has performed one of the early studies conducted to gauge responses to ongoing events by categorizing Persian tweets into different classes and demonstrating how the reactions evolved over time. Besides, \cite{SHOKROLLAHI2021} provides a Post-structuralist Discourse Analysis (PDA) of the Covid-19 phenomenon in Persian society using social network graphs to cluster and explore influencers. Moreover, sentiment analysis of Persian tweets related to Covid-19 has been conducted in this piece of research. Lastly, \cite{NEZHAD2022} presented a sentiment analysis approach to assess Persian community's position toward domestic and imported Coronavirus vaccines. 

Generally, topic detection can help structure an extensive data collection by grouping records into different classes. In order to achieve a reliable classification, many topic modeling techniques are available. \cite{LYE2021E24435} aims to identify the topics of tweets related to Covid-19, fetched with relevant keywords, using Latent Dirichlet Allocation (LDA) topic modeling developed by \cite{LDA2003993}. Similarly, \cite{WICKE2021} employs LDA to illustrate how the subjects linked with the pandemic growth change over time. On the other hand, we compared LDA with Gibbs Sampling for Dirichlet Multinomial Mixture (GSDMM) from \cite{GSDMM2014233} as the first-step in classifying Persian tweets. GSDMM is a modified LDA technique mainly used for short text topic modeling (STTM) tasks, assuming only one topic for each document rather than a probability distribution on all the potential topics from the original LDA. We have considered both LDA and GSDMM models and compared their results to extract the most relevant topics. 

One important factor for analyzing public opinions toward vaccination is to explore trends and reactions during the pandemic. According to the temporal evolution study of different emotional categories and influencing factors implemented in \cite{CHOPRA2021}, expressing doubt about vaccination attracts the highest health-related conversations in all the countries studied during the research. Furthermore, \cite{Thelwall_Kousha_Thelwall_2021} applies a manual content analysis on a small portion of vaccine-hesitant Coronavirus tweets in English to extract major themes discussed regarding hesitancy. Likewise, quantifications introduced in \cite{BONNEVIE202112} compare vaccine-critical posts on Twitter before and after the Covid-19 spread in the United States, which depicts a significant increase in vaccine disapproval, especially in areas related to health authorities, vaccine ingredients, and research trials. Moreover, in \cite{BONNEVIE2020S326}, vaccine opposition themes are manually coded, and afterward, misinformation in each theme, as well as top influencers, are identified. The results show that prominent influencers appear to be well coordinated in misinformation dissemination. Apart from vaccine trends, another direction of our study is to classify vaccine-related tweets into three categories and discuss the evolution of each position (critical, supportive, and neutral) during the pandemic.

In addition to vaccination topics, there are pieces of research conducted on sentiment analysis of tweets with respect to the Covid-19 vaccination. One example is \cite{WICKE2021} that performs sentiment analysis based on the Pattern library, which uses a dictionary of manually-tagged adjectives with values for sentiment polarity in tweets \cite{JMLR:v13}. Similarly, \cite{YOUSEFINAGHANI2021256} utilizes Valence Aware Dictionary and sEntiment Reasoner (VADER), a Python lexicon and rule-based sentiment analysis tool, to assign sentiment polarity to every tweet \cite{Hutto_Gilbert_2014}. Furthermore, in a recent study, \cite{NEZHAD2022} applies a deep learning model reinforced with a sarcasm detection approach to achieve high accuracy for Persian tweets.

Although several projects were carried out for vaccine themes identification and sentiment analysis, many plausible analyses in these areas received less attention, especially in Persian, which is a low-resource language. In previous studies, the main concentration has been usually on vaccine-opposition themes, while we explore themes both for support and opposition themes and demonstrate how they develop throughout time using a grounded theory methodology devised by \cite{grounded2014}. Furthermore, we performed emotion analysis over different prominent vaccination opinions, i.e., positive, negative, and neutral, using our tagged Persian words emotion dataset.

As for the focus on studying users involved in Covid-19 related conversations, one of the first studies was carried out by \cite{BONNEVIE2020S326}. By analyzing "Top Authors" and user engagement, they found that vaccine opposition and misinformation does not come from a diverse distribution of users. Additionally, \cite{YOUSEFINAGHANI2021256} has classified Twitter users into three categories, namely pro-vaccine, anti-vaccine, and neutral and determined how each user belongs to each group. A similar study for the Turkish Twitter has been conducted by \cite{durmaz2022}. A key point to their work is that they have identified anti-vaccine influencers both before and after the pandemic. As for the study at hand, we have used a robust method to categorize each user into the positions mentioned above and study user interactions after and before the public vaccination in Iran.


\section{Data}
\label{sec:data}
As previously stated, this study aims to analyze Persian tweets about vaccination to give insight into the public opinion toward Coronavirus vaccines in Iran. In order to fulfill this goal, we first need to collect relevant data for processing. The data acquisition and preprocessing procedures are fully explored in the following according to the workflow shown in Figure \ref{fig:data_procedure}.

\begin{figure}[ht] 
\centering
\includegraphics[scale=.45]{pics/data_procedure1.png}
\caption{Data Acquisition and Preprocessing} 
\label{fig:data_procedure} 
\end{figure} 


\subsection{Data Acquisition}
To collect Persian tweets and their respective users, we did not just focus on our task at hand; instead, we gathered a comprehensive dataset to be potentially utilized for further studies. This dataset contains 709,460,922 tweets and 6,661,480 active users from Jan. 2012 to Dec. 2021.


In this endeavor, we used Twitter Intelligence Tool (TWINT), which is an advanced Twitter scraping tool developed by \cite{site:TWINT}, allowing us to gather Twitter users' profiles and tweets. We modified TWINT so that we could extract users' and tweets' information for every hour and saved them in Elasticsearch. We chose Elasticsearch as our database and search engine because of its robustness and scalable architecture.

In the next step and in order to separate Covid-19-related tweets, we extracted tweets from Feb. 2020, when the first infected case in Iran was publicly announced, up until Dec. 2021, based on at least one of the following keywords in Persian: \textbf{Corona, Covid, vaccine, and quarantine}. More information on the number of tweets for each keyword is provided in Table \ref{tab:covid_tweets}.

\begin{table}[ht] 
\caption{Keywords and Related Tweets} 
\centering
\begin{tabular}{|c|c|}
\hline\hline
 Keywords & \  Tweet Count \\ [0.25ex] 
\hline
Corona & 2,673,287 \\
Covid & 72,987 \\
Vaccine & 958,250 \\
Quarantine & 398,142 \\
\hline
At least one keyword & 3,825,742 \\
\hline\hline  
\end{tabular} 
\label{tab:covid_tweets} 
\end{table}

Extracted information contains features for users and tweets. We store the list of mentioned users in a tweet and whether a tweet is a reply to another one, along with the count of users' interactions with the tweet. For example, we only save the number of likes per tweet, not the list of users who liked the tweet, since we were only interested in the quantity of this statistic. Similarly, the number of followers and followings are gathered for each user, but the list of the followers or followings is not available. Table \ref{tab:tweet_features} and Table \ref{tab:user_features} describe further details regarding the main properties of tweets and users datasets, respectively.

\begin{table}[ht] 
\caption{Description of Tweet Features} 
\centering
    \begin{tabular}{|p{0.23\linewidth} | p{0.67\linewidth}|}
\hline\hline
 Feature Name & \  Description \\ [0.25ex] 
\hline
Tweet ID & Unique ID for every tweet \\
User ID & Unique ID employed by Twitter for the owner of the tweet \\
Conversation ID & ID employed by Twitter for the conversation \\
Retweet Count & Number of retweets \\
Reply Count & Number of replies \\
Like Count & Number of likes \\
Reply to & This field contains the User ID of the replied tweet if the current tweet is a reply to another tweet  \\
Mentions & List of mentioned user IDs \\
Created at & Creation time of the tweet \\
Source & Twitter Source (Android, iPhone, iPad, Web App) \\
Hashtags & List of hashtags in the tweet \\
URLs & List of URLs in the tweet \\
Tweet & Tweet content \\
\hline\hline  
\end{tabular} 
\label{tab:tweet_features} 
\end{table} 

\begin{table}[ht] 
\caption{Description of User Features} 
\centering
    \begin{tabular}{|p{0.23\linewidth} | p{0.67\linewidth}|}
\hline\hline
 Feature Name & \  Description \\ [0.25ex] 
\hline
ID & Unique ID employed by Twitter for every user \\
Username & The name that identifies the user \\
Bio & Biography of the user \\
Location & Location of the user \\
URL & Link in the user account \\
Joined Time & Time of account creation \\
Tweet Count & Number of tweets \\
Like Count & Number of total likes\\
Followers & Number of followers\\
Followings & Number of followings\\
Private & Whether user account is private\\
Verified & Whether user account is verified\\
\hline\hline  
\end{tabular} 
\label{tab:user_features} 
\end{table}

\subsection{Data Preprocessing}
\label{subsec:dp}
With inspiration from \cite{Philippines2021}, which has performed a comprehensive data preprocessing over tweets in the Philippines, we improved the procedure of data analytics by devising a pipeline for cleaning and preprocessing, consisting of multiple states. The scheme of preprocessing is shown in Figure \ref{fig:data_procedure}. Further details are aptly explained in the following:

\medbreak
\subsubsection{Removing Duplicates}
Repeated characters are abundant, notably in emojis, vowels, and stress letters within a word. To rectify this discrepancy, we replaced any repeated characters with more than two instances in a row with only one character of that type. For instance, the word "Helloooo!" will be replaced with "Hello!" during this phase. Furthermore, we substituted similar in-a-row emojis with only one. Apart from handling the discrepancy mentioned above, this action would affect emotion analyses methodologies, as we were aware that, for example, the negative stance coming from "I haaattee this!" is probably much higher than "I hate this!", however, due to our word-based emotion analysis approach, this technique did not impacted our analysis.

Afterward, we removed duplicate records with similar tweet content and one-word tweets because these tweets often do not imply any meaningful concepts. After this phase, 286,546 records were eliminated.

\medbreak
\subsubsection{Text Cleaning}
We used the Clean-Text library in python \citep{site:CLEAN_TEXT} in addition to our customized techniques for data cleaning. Clean-Text is used for providing a better text representation. We employ this library to fix various Unicode errors and remove URLs, phone numbers, emails, and currency symbols. Moreover, we also removed HTML tags as well as meaningless characters and punctuation.

\medbreak
\subsubsection{Normalization}
For this purpose, we utilized the Hazm library, which is implemented for digesting Persian text \citep{site:HAZM}. We used Hazm Normalizer to unify different classes of terms.

\medbreak
\subsubsection{Removing Stopwords}
We defined a set of Persian stopwords to be removed from tweets using a combination of the Hazm stopwords dataset and Persian stopwords defined in \cite{site:PERSIAN_STOPWORDS}. Afterward, we investigated every word in these two sets and removed those that might be relevant to Covid-19 and vaccination. Finally, we evaluated top-appearing words in Covid-19-related tweets and checked whether they refer to any meaningful notion; if not, we appended them into our stopwords set.

\medbreak
\subsubsection{Lemmatization}
We also performed lemmatization for our Persian dataset using the Hazm lemmatizer in order to reduce inflections and variant forms to the base form. Referring to the fact that lemmatization can change or even inverse the meaning of the words (especially in turning negative form verbs into infinitives), to compare the effect of lemmatization in the topic modeling results and subsequent steps, we created two datasets, one with lemmatization (LEM) and the other without it (N-LEM).

\section{Methods}
\label{sec:methods}

In order to figure out a way to filter tweets relevant to vaccination, we used a topic modeling approach combined with a keyword-based search. We also applied a transformer-based machine learning technique to classify vaccine-related tweets into three major groups (vaccine-critical, vaccine-supportive, and neutral).

Additional details about the exploited research methodology is shown in Figure \ref{fig:method_workflow}.

\begin{figure}[ht] 
    \centering
    \includegraphics[scale=.5]{pics/method-workflow2.png}
    \caption{Workflow of Twitter Analysis toward Covid-19 Vaccination} 
    \label{fig:method_workflow} 
\end{figure}



\subsection{Topic Modeling}

Topic modeling, which is also referred to as probabilistic clustering, is an approach to structuring a large dataset and classifying it into smaller, more interpretable, and spatially separated clusters. There are many topic modeling methodologies available, of which we chose LDA, which is an unsupervised machine learning algorithm and the most widely used technique, and GSDMM, an approach for short-text classification tasks. We applied these two topic modeling techniques to our dataset and compared their results to see which one performs better.
 
We used a combination of two criteria to assess the performance of our topic modeling algorithms:
\begin{enumerate}
    \item Coherence measure ($C_v$) by \cite{CV2010100}: topic coherence measures calculate the degree of semantic similarity between high-scoring terms in a topic to determine its score. These metrics aid in distinguishing between semantically and non-semantically interpretable issues.
    
    \item Human judgment: similar to what \cite{NIPS2009_f92586a2} has proposed, we carried out the word and topic intrusion tasks, focusing on the meaning of the words in subjects to examine topics and assess the interpretability of each group.

\end{enumerate}

In order to achieve the most reasonable topic models, we evaluated several factors over a sample of 100,000 tweets. First, we compared the LEM dataset with N-LEM based on the coherence value over the changes of multiple hyper-parameters and word representations. LEM dataset outperforms N-LEM on an average of 2.3\% in $C_v$ score over 25 executions. Because of the mentioned reason, we opted to use LEM dataset for the rest of the topic modeling process.

Next, we compared Bag-of-Words (BoW) and Term Frequency/Inverse Document Frequency (TF-IDF) word representation techniques. For this purpose, we filtered out extreme tokens that appeared in less than 15 tweets or more than 50\% of all tweets and kept only the top 100,000 tokens for topic modeling execution. On an average of 20 executions, BoW results were 2.2\% better than TF-IDF.

Finally, we tuned LDA and GSDMM hyper-parameters to find the best results for each method. The parameters giving the best results are described below.

LDA parameters:
\begin{itemize}
    \item $NT$: The number of themes to be retrieved from the training corpus.
    \item $NP$: Number of passes through the corpus during training.
    \item $\alpha$: A number for a symmetric prior over document-topic distribution.
    \item $CS$: Number of documents/tweets in each training chunk.
\end{itemize}

GSDMM parameters:
\begin{itemize}
    \item $NT_G$: The upper limit for the number of topics. 
    \item $NI$: The upper limit for the number of iterations to perform.
    \item $\alpha_G$: A parameter ranging from 0 to 1, controlling records' affinity for a larger cluster.
    \item $\beta_G$: A parameter ranging from 0 to 1, controlling records' affinity for a more homogeneous cluster.
\end{itemize}

We evaluated results for $NT$ (and $NT_G$) between 5 and 10 and $NS$ (and  $NI$) between 6 and 12 for LDA and GSDMM models. Based on the $C_v$ coherence measures shown in Table \ref{tab:lda_gsdmm_coherences} and human judgments, LDA model outperforms GSDMM on our dataset.

\begin{table}[ht] 
\caption{Best Results Gained from Topic Modeling}
\centering
    \begin{tabular}{|c|c|c|}
\hline\hline
 Topic model & \  Number of Topics & \  Coherence (C\_v) \\ [0.25ex] 
\hline
LDA & 10 & 52.72\% \\
GSDMM & 9 & 42.46\% \\
\hline\hline  
\end{tabular} 
\label{tab:lda_gsdmm_coherences} 
\end{table}

After finding the best model for the tweets using the LDA technique, we manually labeled each group according to the concept perceived from each cluster. More information about these topics is provided in Table \ref{tab:lda_topic_details}.

\begin{table}[ht] 
\caption{Final Topics} 
\centering
    \begin{tabular}{|c|c|c|}
\hline\hline
 Topic description & \  Number of Tweets & \  \% of all tweets \\ [0.25ex] 
\hline
Religious and governmental & 257,314 & 7.27\% \\
Relatives and mourning & 370,644 & 10.47\% \\
Vaccination opinions & 527,294 & 14.90\% \\
Regional news & 293,378 & 8.29\% \\
Reports and statistics & 188,088 & 5.31\% \\
Symptoms & 501,034 & 14.16\% \\
Political and dissatisfaction & 161,774 & 4.57\% \\
quarantine and education & 456,551 & 12.90\% \\
Vaccination (news, reports) & 424,406 & 12.00\% \\
Political and financial & 358,713 & 10.13\% \\
\hline\hline  
\end{tabular} 
\label{tab:lda_topic_details} 
\end{table}

\subsection{Vaccine-related Tweets}

Keyword-based search is usually practical for providing a required subset; however, it only relies on the presence of a list of words. Thereby, there is a lack of implication and sentence meaning when utilizing keywords to provide data. In order to deal with this challenge and obtain the most relevant tweets to Covid-19 vaccination, we developed a hybrid approach and merged the results gained from the keyword-based technique with our topic modeling outcomes.

According to our topic modeling results, two groups were related to vaccination, i.e., vaccination opinions and vaccination news and reports. First, we extracted tweets with a high probability of belonging to one of these two clusters, defining a more than or equal to 0.5 as a high probability. Based on this criteria, 499,228 tweets were extracted from the dataset.

Then, we defined a series of vaccine-related keywords, which their translations to English are as follows: \textbf{vaccine, vaccination, Astra, AstraZeneca, Pfizer, Moderna, Sputnik, Covaxin, Sinopharm}. The rest of the Covid-19-related tweets were checked by these words. 538,212 tweets contained at least one of these keywords. Consequently, we stored 1,037,440 tweets related to vaccination for further studies.


\subsection{Vaccine-related Tweets Classification}

After providing vaccine-related tweets, we aimed to classify them into three major groups: vaccine-critical, neutral, and vaccine-supportive. To achieve this, first, we manually labeled 6000 tweets using the grounded theory approach. For the first 1000 items of the extracted dataset, the first two authors separately labeled the tweets into the three categories mentioned above. Then, the two labeled datasets were compared against each other using Cohen's Kappa metric, having a consistency of 78 percent. After a discussion over the tweets that did not get the same label, the consistency of 90 percent was reached over the first 1000 labeled tweets. Afterward, the remaining part was split into two datasets of length 2500; each one labeled by only one person. The results are mentioned in Table \ref{tab:polar_dist}:

\begin{table}[ht] 
\caption{Polarity Distribution of Hand-Labeled Dataset} 
\centering
    \begin{tabular}{|c|c|c|}
\hline\hline
 Position & \  Count & \  Percentage \\ [0.25ex] 
\hline
Vaccine-Critical & 1735 & 28.9\% \\
Neutral & 2611  & 43.5\% \\
Vaccine-Support & 1654  & 27.5\% \\
\hline\hline  
\end{tabular} 
\label{tab:polar_dist} 
\end{table}

Subsequently, manually labeled data were utilized for vaccine opinion classification. We applied a combination of 5 different factors for data preprocessing demonstrated in Table \ref{tab:dataset_extensions}. For text cleaning and removing stopwords, we considered three different criteria, i.e., extreme, moderate, and no filtering. The details of these three criteria are as follows:

\begin{itemize}
    \item Extreme: Applying all the methods mentioned in Section \ref{subsec:dp}.
    \item Moderate: Allowing the presence of vaccine-related words, for which we reduced the size of the stopwords set by 30\%. Also, for the text cleaning part, punctuations, numbers, and conversational forms were kept in tweets.
    \item No Filtering: Keeping tweet contents intact.
\end{itemize}


On the other hand, we assumed only two possibilities for duplicate removal and lemmatization, whether or not to apply them. We created 36 different datasets from our original vaccine-related tweets in this stage.


\begin{table}[ht] 
\caption{Dataset Extension Criteria} 
\centering
    \begin{tabular}{|c|c|c|}
\hline\hline
 Criteria & \  States & \  \# of States \\ [0.25ex] 
\hline
Duplicate Removal & Keep / Remove & 2 \\
Text Cleaning & Extreme / Moderate / No Filter & 3 \\
Lemmatization & Apply / Ignore & 2 \\
Stopword Elimination & Extreme / Moderate / No Filter & 3 \\
\hline\hline  
\end{tabular} 
\label{tab:dataset_extensions} 
\end{table}

Finally, we employed transformer-based machine learning techniques to accomplish our vaccine-related tweets classification. We fine-tuned and compared a series of these approaches with pre-trained models that use a masked language modeling (MLM) objective to find the best result. Utilized strategies are discussed in the following:

\medbreak
\subsubsection{Bidirectional Encoder Representations from Transformers (BERT)}

BERT, introduced in \cite{BERT2018}, applies the bidirectional training of transformer, a popular attention model, to language modeling. This method contrasts with previous endeavors, since it viewed a text sequence from left to right or combined left-to-right and right-to-left training mode. We employed BERT-base and BERT-large models initially. Then, we utilized ParsBERT from \cite{ParsBERT}, a monolingual language model based on Google’s BERT architecture, pre-trained on large Persian corpora with more than 3.9M documents, 73M sentences, and 1.3B words. Similar to previous models, we fine-tuned ParsBERT v3.0 and compared the results with BERT-base and BERT-large.

\medbreak
\subsubsection{Robustly Optimized BERT Pretraining Approach (RoBERTa)}

\cite{ROBERTA2019} trained BERT with more input data and epochs and came up with RoBERTa, showing that both techniques help in achieving better results. Furthermore, this approach slightly improved masking and data pretraining processes. Firstly, we used RoBERTa-base and large models, like the method used with the pre-trained BERT models. Next, we utilized Twitter-RoBERTa-base for sentiment analysis which is trained on about 58M tweets and fine-tuned for sentiment analysis with the TweetEval benchmark from \cite{TWEETEVAL2020}. Finally, we assessed Persian RoBERTa, which is a model similar to ParsBERT's idea but based on RoBERTa architecture.

\medbreak
\subsubsection{Lite BERT for Self-supervised Learning of Language Representations (ALBERT)}

ALBERT, introduced by \cite{ALBERT2019}, brought up two significant innovations over BERT. First, it factorized embedding parameterization. ALBERT uses a small embedding size and then projects it to the transformer hidden size. Moreover, ALBERT shares all parameters between transformer layers too. For our classification task, we employed the Persian ALBERT v3.0 model, which is provided in ParsBERT.

\medbreak
\subsubsection{Distilled Version of BERT (DistilBERT)}

Distillation, as mentioned by \cite{DISTILLATION2015}, is the procedure of training a small student model to mimic a larger teacher model as close as possible, and DistilBERT was introduced based on this concept \citep{DISTILBERT2019}. To incorporate DistilBERT into the study, we utilized Persian DistilBERT v3.0 model implemented by ParsBERT.

\medbreak
\subsubsection{Generalized Auto-regressive Pretraining for Language Understanding (XLNet)}

BERT has two main limitations. It distorts the input with masks and suffers from dissimilarity of pretraining and fine-tuning. In addition, BERT ignores the dependency between masked positions. To address these issues, \cite{XLNET2019} used a permutation language modeling idea to create XLNET. Furthermore, they employed some techniques for masking and using the position of the prediction token. We used XLNet-base and XLNet-large pre-trained models to assess this architecture, evaluate the results, and compare them with other transformer-based models.

\medbreak
\subsubsection{Unsupervised Cross-lingual Representation Learning at Scale (XLM-R)}

In addition to monolingual models, we also fine-tuned and evaluated XLM-RoBERTa (XLM-R) from \cite{XLMR2019}, a transformer-based multilingual masked language model pre-trained on text in 100 languages. We used XLM-RoBERTa-large model for this direction.

\subsection{Emotion Analysis}

To analyze the emotion of vaccine-related tweets during the Covid-19 pandemic, we used a proprietary dataset, which provides the level of happiness and anger for a lexicon of 8,375 common Persian words found on Twitter. Six individuals participated in evaluating and labeling this dataset. Every word in the lexicon was assigned two numbers between 1 to 9, indicating the intensity of happiness and anger. 5 Refers to a neutral state, and higher numbers refer to more extreme emotions. This method is similar to Hedonometer, proposed by \cite{HEDONOMETER2011}, approach for measuring expressed happiness in other languages. We calculated an average happiness and anger weight for each word in the dataset. Then we fitted the inverse of the normal distribution function to assign weights to each number between 1 and 9. The purpose of using this function was to highlight the effect of extremely emotional words. 

Afterward, we used the dataset to scale up the emotion analysis from individual words to texts. In order to evaluate the weighted average level of anger and happiness, we used an algorithm (H-AVG), based on Hedonometer's proposal, which is as follows:

$$
h_{\mathrm{avg}}(T)=\frac{\sum_{i=1}^{N} h_{\mathrm{avg}}\left (w_{i}\right) \times freq_{i}}{\sum_{i=1}^{N} freq_{i}}
$$

$$
a_{\mathrm{avg}}(T)=\frac{\sum_{i=1}^{N} a_{\mathrm{avg}}\left (w_{i}\right) \times freq_{i}}{\sum_{i=1}^{N} freq_{i}}
$$

where $freq_{i}$ is the frequency of the word $w_{i}$ ($i$th word) in text $T$, and $N$ is the number of words present in $T$.

Before calculating the averages, we dropped every word not found in the emotion dataset. Furthermore, we removed all neutral words to focus more on the sheer level of happiness and anger in tweets. Next. We calculated average happiness and anger of each tweet while disregarding every word not found in our initial emotion dataset. To have more robust results, we chose to consider average happiness and anger scores for missing words as shown below:

$$
h_{\mathrm{avg}}(w)=\frac{\sum_{i=1}^{M} h_{\mathrm{avg}}\left (T_{i}\right)}{\sum_{i=1}^{M} freq_{i}}
$$

$$
a_{\mathrm{avg}}(w)=\frac{\sum_{i=1}^{M} a_{\mathrm{avg}}\left (T_{i}\right)}{\sum_{i=1}^{M} freq_{i}}
$$

where $T_{i}$ is the $i$th text containing word $w$, and $M$ is the number of texts containing $w$.

Later, we utilized H-AVG again to compute the average happiness and anger per day during the Covid-19 pandemic and compared the results with Covid-19-related events in Iran. The results are reported in Section \ref{sec:results}.

\subsection{Vaccine Themes}

Upon achieving an acceptable result (mentioned in details in Section \ref{subsec:vaccine-classification}) for vaccine-related tweets classification, the main subjects in vaccine opposition and support were extracted. At first, 500 randomly selected tweets, from the two groups combined, were considered. Next, we used a grounded theoretical approach and inductive analysis to identify the main themes manually. We analyzed and assigned related themes to each tweet and extracted essential keywords relevant to each theme using the content of the tweets. In order to focus only on the principal matters of each tweet, at most three relevant themes were considered for each tweet. Afterward, for each theme found in vaccine opposition and support groups, we established a set of keywords identifying the concept of the subject. Finally, these keywords were used to categorize the rest of the tweets in each vaccine-related group. The aim was to find one or more themes for at least 85\% of tweets (except for neutral ones). For this goal, we continued grouping tweets while adding extra categories. Upon reaching this purpose, 15 distinct themes, each with their unique set of keywords, were found for the vaccine opposition group. This count was 16 for the vaccine-supportive group of tweets; Meaning that the core topic of 85\% of vaccine opposition and support tweets were identified using 31 themes. Remaining 15\% had vague or unknown overall topics. Most of the short tweets (less than four words) fitted into this group. Since we utilized a keyword-based approach, having insufficient number of words was the most significant reason not to be categorized into any pre-defined themes. For instance, \textit{How about Vaccination?}, is a good example that does not convey any meaningful or subjective opinion over the subjects.




\subsection{User Interaction Analysis}

Evaluating user activities, especially for influencers (users with high interaction rate), can give us insight into user attitudes and changes in trends that are not perceivable via assessing tweets.

In order to evaluate users' behavior toward the Covid-19 vaccination, first, we categorized users, monthly from February 2020 to December 2021, into four different groups, i.e., anti-vaccination, neutral, pro-vaccination, and mixed. If 60\% or more of a user's tweets about vaccination in a month belonged to the vaccine opposition group, the user was categorized in the anti-vaccination group for that specific month. In a similar fashion, we classified pro-vaccination and neutral groups. Based on these criteria and the mentioned threshold, if a user could not be fitted into any specific group in a month, we considered him/her as mixed, the full details of the method exploited for user classification is presented in algorithm \ref{alg:cap}.


\begin{algorithm}
\caption{Single User Classification Algorithm}\label{alg:cap}
    \begin{algorithmic}

    \State $a \gets$ percentage of anti-vaccination tweets in a month
    \State $p \gets$ percentage of pro-vaccination tweets in a month
    \State $n \gets$ percentage of neutral tweets in a month
    
    \If{Any of the representative variables is greater than $60$}
        \State User is categorized accordingly
    \ElsIf{$a == 0$}    \Comment{40 $\leq$ p, n $\leq$ 60}
        \State User is classified as pro-vaccination
    \ElsIf{$p == 0$} \Comment{40 $\leq$ a, n $\leq$ 60}
        \State User is classified as anti-vaccination
    \Else
        \State User is classified as mixed
    \EndIf
\end{algorithmic}
\end{algorithm}

In the next step, we assessed influencers' activities and interactions. We made a user interaction graph where there is an edge between two users if one is mentioned or has replied to the other. The total number of a user's connections (degree of a node) is stored as the metric for analyzing the influence of a person. By computing the number of connections each user had per month, we considered the top 40 users with the highest degree for each of the 23 available months as the influencers (top 0.2 percent of each month's users). Then, we studied the distribution of influencers with respect to the four categories mentioned before. Lastly, to have an overview of the overall interactions and the effect of vaccine program, we created two social networks out of the users. One before the public vaccination in Iran, 1 June 2021, and the other one after that date. 


\section{Results}
\label{sec:results}

\subsection{Vaccine-related Tweets}

We gathered 3,539,196 tweets relevant to Covid-19, and 1,037,440 of them were categorized as vaccine-related tweets (Shown in Figure \ref{fig:vaccine-tweets-timeline}) based on our hybrid approach described in Section \ref{sec:methods}.

From February 2020 to December 2021, an average of 37.65\% (median 42.09\%) of Covid-19 tweets per day were related to vaccination. To delve deeper into this analysis, we assessed our data concerning two important dates, the introduction of Coronavirus vaccines (9 February 2021) and the beginning of the public vaccination in Iran (1 June 2021).

According to our evaluations shown in Table \ref{tab:tweets-over-pandemic}, a greater proportion of the tweets after 9 February 2021 were related to vaccination in comparison to the previous period of the Covid-19 pandemic. Similarly, after the beginning of the public vaccination, the rate of vaccine-related tweets was significantly higher than before that date. We found that subsequent to the official introduction of vaccines and public vaccination, vaccine-related tweets increased enormously, referring to the new subjects arising from vaccine matters, such as side effects, effectiveness, and general opinions toward taking vaccines.

\begin{table}[ht] 
\caption{Vaccine-related Tweets over Covid-19 Pandemic} 
\centering
    \begin{tabular}{|c|c|}
\hline\hline
 Pandemic Period & \  Daily Avg. Vaccine-related Tweets \\ [0.25ex] 
\hline
Before 9 Feb. 2021 &  21.11\%\\
After 9 Feb. 2021 &  54.43\%\\
\hline
Before 1 Jun. 2021 &  27.29\%\\
After 1 Jun. 2021 &  58.71\%\\
\hline\hline  
\end{tabular} 
\label{tab:tweets-over-pandemic} 
\end{table}

\begin{figure*}
    \centering
    \includegraphics[width=\textwidth]{pics/vaccine-tweets-timeline3.png}
    \caption{Relative Percentage of Vaccine Tweets Over Time} 
    \label{fig:vaccine-tweets-timeline} 
\end{figure*}

\subsection{Vaccine-related Tweets Classification}
\label{subsec:vaccine-classification}

For classifying our vaccine-related tweets into vaccine-opposition, neutral, and vaccine-support groups, after labeling 6000 tweets, we randomly split our tagged data into train and validation sets. 5000 tweets were considered as the training set, and the rest for the validation. Further details of the partition are provided in Table \ref{tab:train_eval_sets}.

\begin{table}[ht] 
\caption{Polarity Distribution in Training and Validation Sets} 
\centering
    \begin{tabular}{|c|c|c|c|}
\hline\hline
 Sets & \  Critical & \  Neutral & \  Supportive \\ [0.25ex] 
\hline
Training & 1467 & 2167 & 1366 \\
Validation & 268 & 444 & 288 \\
\hline\hline  
\end{tabular} 
\label{tab:train_eval_sets} 
\end{table}


Based on the dataset extension method described in Section \ref{sec:methods}, where we had four hyperparameters to tune, the best average result belonged to the dataset with no duplicate removal, a moderate odd pattern removal, no lemmatization, and no stopword elimination. We called this dataset the final dataset. We continued by fine-tuning our transformer-based models on the final dataset and compared their results to find the best model for our classification task. Table \ref{tab:classification-results} displays more information about the top results of our classification. As it can be seen, our fine-tuned Pars-BERT model outperforms all the other approaches with 62.03\% F1-Score. Other models such as BERT, RoBERTa, Twitter-RoBERTa, XLM-R, and XLNET did not reach an F1-Score of more than 30\%.

\begin{table}[ht] 
\caption{Vaccine-related Tweets Classification Results} 
\centering
    \begin{tabular}{|c|c|c|c|}
    \hline\hline
     Models & \  F1-Score & \  Accuracy (O, N, S) \\ [0.25ex] 
    \hline
    Persian ALBERT & 39.9 & 77.22, 50.34, 4.24 \\
    Persian RoBERTa & 53.78 & 48.40, 61.78, 46.64 \\
    Persian DistilBERT & 58.45 & 51.25, 69.57, 49.12 \\
    \textbf{Pars-BERT} & \textbf{62.03} & \textbf{63.06, 60.81, 61.81} \\ 
    \hline\hline 
\end{tabular} 
\label{tab:classification-results} 
\end{table}


\subsection{Emotion Analysis}

The results of tweet emotion detection are presented in Figures \ref{fig:hap-emo} and \ref{fig:ang-emo}. In these time series, several important dates (peaks and valleys) exist for each emotion type (shown by black triangles). We cross-referenced these dates with the introduction of vaccines and two available time series, namely, the daily number of new cases and the number of deaths. We found several interesting correlations, including:

\begin{itemize}
    \item March 2020 - April 2020: The first peak of the pandemic (First happiness valley and anger peak): Regarding the first Covid-19 worldwide shock, in addition to the unavailability of vaccines and other treatments, there was a huge public panic concerning the Coronavirus consequences.
    \item June 2020 - July 2020: The recovery from the first peak (First happiness peak and anger valley): Although no vaccination methodology was discovered, the overall downward rate of the Covid-19 infections gave rise to the thought that the public is less susceptible to the disease.
    \item October 2020 - November 2020: The start of the third epidemic (Second anger peak): Concerning the initiation of vaccination in other countries and the reports referring to the effectiveness of vaccines, in addition to the critical situation and high rate of infection in Iran, made a huge dissatisfaction and outrage against public status toward Covid-19.
    \item July 2021 - September 2021: The period of Delta variant infection (Third happiness valley and anger peak): The Delta variant of the Coronavirus was one of the most significant eras in terms of daily new cases and deaths. As a result, albeit vaccination effectively controlled sad and angry opinions, the last anger peak and happiness valley are more considerable compared to other important dates.
\end{itemize}

\begin{figure}
    \centering
    \includegraphics[scale=0.3]{pics/happiness_emotion4.png}
    \caption{Happiness Trend of Vaccine-Related Tweets during Covid-19 Pandemic} 
    \label{fig:hap-emo} 
\end{figure}

\begin{figure}
    \centering
    \includegraphics[scale=0.3]{pics/anger_emotion5.png}
    \caption{Anger Trend of Vaccine-Related Tweets during Covid-19 Pandemic} 
    \label{fig:ang-emo} 
\end{figure}

Furthermore, looking at the entire happiness time series, we observe an overall rising tendency. This upward trend is apparent when we compare the before and after the introduction of vaccines (February 2021) periods. Conversely, the anger time series is the opposite. We see a declining tendency when comparing the averages before and later vaccines. We used Spearman's rho and Pearson's coefficients to evaluate the correlation between the happiness and anger trends. The coefficient for both measures was -0.965 with p-value $<$ 0.001, showing a high negative correlation between these two trends.

As the results show, vaccination significantly affected public happiness and anger toward Coronavirus. Due to vaccines' effectiveness, people trust vaccination more as a remedy for Coronavirus; hence, they tend to tweet less sad or angry tweets around the Covid-19 subject. Furthermore, we figured out that there is a high correlation between sadness and anger regarding the Covid-19 vaccination, which could be an example of how different emotions are being affected in a similar manner by an external factor such as a pandemic. It might also explain that the same negative or same positive emotions could significantly strengthen each other if they are aligned.










\subsection{Vaccine Themes}
Classified data were analyzed to extract themes for both vaccine-critical and supportive tweets. Based on the extraction methods mentioned before, 219,646 tweets were labeled as having vaccine-opposition content. These tweets belonged to 15 distinct categories (and one category named $other$). The same approach was adopted for the vaccine-supportive tweets, which consisted of 339,351 distinct Tweets in 16 different themes. Just like the critical side, a category called $other$ was also considered. The details of both themes are available in table \ref{tab:vaccine-themes}. Since we utilized a keyword-based approach, it is possible that a single tweet belongs to more than one category (in both themes). Therefore, the sum of frequencies for vaccine supportive and critical are larger than 100\%.



\begin{table}[ht] 
    \caption{Brief Description of Vaccine Themes}
    \centering
    \begin{tabular}{|p{0.23\linewidth}|p{0.37\linewidth}|p{0.12\linewidth}|p{0.14\linewidth}|}
    \hline\hline
     Theme Name & \  Description  & \  Critical & \  Supportive \\ [0.25ex] 
    \hline
    Side Effects & Mentions of health impacts caused by vaccines & 43,608 (19.85\%) & 46,551 (13.72\%) \\
    \hline
    Pharmaceuticals & Talks about vaccine names and companies making vaccines & 34,398 (15.66\%) & 47,506 (14.00\%) \\
    \hline
    Political / Governmental & Conversations on governmental actions towards mass vaccination & 94,748 (43.14\%) & 135,095 (39.81\%) \\
    \hline
    Vaccine Ingredients & Related to how vaccines are created and their materials & 10,293 (4.69\%) & 7,991 (2.35\%) \\
    \hline
    Research Trials & References to experiments and lab works & 26,394 (12.02\%) & 62,252 (18.42\%) \\
    \hline
    Religion & Topics on faith and religious practices & 9,793 (4.46\%) & 18,971 (5.59\%) \\
    \hline
    Ineffectiveness / Hesitancy & Conversations on low vaccine impression and incapability to fight Covid-19 & 50,639 (23.05\%) & - \\
    \hline
    Safety / Sufficiency & References to vaccine performance and ability & - & 88,627 (26.12\%) \\
    \hline
    Disease Prevalence & Mentions of virus mutations over time & 4,756 (2.17\%) & 20,843 (6.14\%) \\
    \hline
    Family & Expression of the concern for family members and relatives & 15,278 (6.96\%) & 28,253 (8.33\%) \\
    \hline
    Foreign Countries & Talks of pandemic state in other countries and imported vaccines & 93,478 (42.56\%) & 79,794 (23.51\%)  \\
    \hline
    Lockdown Denial & Related to ignoring the pandemic and worldwide crisis &  5,602 (2.55\%) & - \\
    \hline
    Pandemic Confirmation & Relevant to accepting the pandemic & - & 88,913 (26.20\%) \\
    \hline
    Mandatory vaccination & Criticism of forced vaccination and encouragements & 15,616 (7.11\%) & - \\
    \hline
    Influential Users & Mentions of influencers and their actions towards vaccination & 15,334 (6.98\%) & 31,970 (9.42\%) \\
    \hline
    Vaccine Alternatives & Other vaccine substitutes, their advantages and disadvantages & 4,914 (2.24\%) & 2,406 (0.71\%) \\
    \hline
    Medics and Hospitals & Relevant to doctors and other treatment staff & 37,053 (16.87\%) & 48,472 (14.28\%) \\
    \hline
    Hope / Envy & Expressions of impatience towards receiving vaccination & - & 37,142 (10.95\%) \\
    \hline
    Availability & Demanding public vaccination from authorities & - & 9,467 (2.79\%) \\
    \hline
    Others & Not categorized in any of themes & 31,953 (14.53\%) & 45,659 (13.45\%) \\
    
    \hline\hline  
    \end{tabular}
    \label{tab:vaccine-themes} 
\end{table}

Figure \ref{fig:co-occ} illustrates the correlation of themes for both supportive and critical groups. The left correlation matrix belongs to supportive themes, and the right one represents the critical side. There are several strong relationships that are worth mentioning, which are as follows:
\begin{itemize}
    \item Influencers and Political (Both supportive and critical): Most of the tweets concerning influencers like actors and officials regard their reaction and decisions toward the Covid-19 based on the political situation in Iran.
    \item Prevalence and Pandemic Confirmation (Supportive): As the Covid-19 prevalence and mutations affect people increasingly, there is a higher rate of widespread pandemic acceptance and supportive opinions regarding taking vaccines.
    \item Ingredients and Side Effects (Supportive): It seems that talking about vaccine ingredients usually infers the matters impacting human health for a long or short time. That is why most of the contexts discussing ingredients also refer to the side effects in humans.
    \item Denial and Ineffectiveness (Critical): Ignoring the pandemic is alongside disregarding the Covid-19 crisis. On the one hand, people denying Coronavirus might also tend to deny vaccines and their effectiveness; on the other hand, they might consider both Covid-19 and vaccines a delusion.
    \item Religious and Political (Critical): Tweets containing spiritual concepts, Talk about Covid-19 from the religious viewpoint. According to the results, these tweets seem to relate the political decisions toward vaccination in Iran to religious instructions.
\end{itemize}



\begin{figure}
    \centering
    \includegraphics[scale=0.51]{pics/co-occurrences-ann6.png}
    \caption{Support (a) and Opposition (b) Themes Correlation} 
    \label{fig:co-occ} 
\end{figure}



\subsection{User Interaction Analysis}

Assessing users' mindsets behind their tweets led us to categorize them into four different groups: anti-vaccination, pro-vaccination, neutral, and mixed. Figure \ref{fig:user-classification} presents the flow of changes in anti, pro, and mixed classes based on the relative percent of monthly coverage for each group during the Covid-19 pandemic.

\begin{figure}
    \centering
    \includegraphics[scale=0.4]{pics/user_classification7.png}
    \caption{User Classes During Covid-19 Pandemic} 
    \label{fig:user-classification} 
\end{figure}

As it is shown, between the introduction of Coronavirus vaccines in Iran and the beginning of the public vaccination (February to June 2021), the percentage of anti-vaccination users is 4.18\% lower compared to the months that vaccination was in progress. Similarly, the ratio of anti-vaccination users between February and June 2021 was 2.78\% lower than the months prior to vaccine introduction. On the other hand, analyzing the pro-vaccination users demonstrated that in a period that lasted from vaccine introduction up to the end of 2021, the percentage of vaccination supporters increased by 9.67\% compared to the time before February 2021. 

By analyzing the results, we observed that vaccination and its results helped reduce criticism about vaccines. However, to evaluate the activity of each group, we calculated ratio of the number of tweets to the number of users for both supportive and critical groups. According to Figure \ref{fig:tweet-user-ratio} the average ratio for critical group is 0.2 (15.7 \%) higher than the supportive. This difference is even more considerable during the time between introduction of vaccines and the start of public vaccination. According to the results, we can infer that during this period, people understood that the vaccination was inevitable; hence their opposition and hesitancy are even more expressed. On the other side, those who agreed with vaccination represented their thoughts more widely than before.

After the initiation of public vaccination, there was a considerable fall in the rate of the critical group, showing that the recovery results convinced some critics to accept the efficiency of Covid-19 vaccines. However, inferring from the slight decline in the supportive group, it appears that the vaccination results were not as promising as they expected.

\begin{figure}
    \centering
    \includegraphics[scale=0.39]{pics/tweet_to_user_ratio8.png}
    \caption{Tweet to User Ratio} 
    \label{fig:tweet-user-ratio} 
\end{figure}

In the next step, we evaluated influencers by considering user interactions, which includes replies and mentions. As previously mentioned, the top 40 users with the highest rate of interaction for each month were labeled as influencers. Figure \ref{fig:influencer-results} shows the classification of such users during the pandemic. By looking at the number of influencers categorized as pro-vaccination and anti-vaccination, we discovered that vaccine critical influencers made up 7.91\% of the whole influencer population before the introduction of Covid-19 vaccines in Iran. However, this share changed to 8.63\% afterwards. As for the other side, vaccine supporters' coverage increased from 16.04\% to 18.18\%. From these observations, we can infer that the dissemination of vaccines resulted in more non-neutral tweets and conversations from influencers, as factors such as efficiency and side effects became much more apparent than before and users became extra opinionated.


\begin{figure}
    \centering
    \includegraphics[scale=0.25]{pics/influencers_per_month9.png}
    \caption{Top Influencers Per Month} 
    \label{fig:influencer-results} 
\end{figure}

In order to have a summary of the overall interactions and the impact of the vaccination program, we created two networks. One for before ($BV$) and the other after ($AV$) the public vaccination (June 2021) in Iran, represented in Figure \ref{fig:before-after-vaccination}. We excluded users who had less than 350 interactions in each of the two mentioned periods for these networks. Green nodes represent pro-vaccination and red ones illustrate anti-vaccination. Neutral and mixed users appear as blue and gray nodes, respectively. Furthermore, some users are not found in our separately-gathered dataset of users, which might be non-Persian users mentioned or replied to by others; we specified them in black. Moreover, the number of connections is demonstrated with node diameter.


\begin{figure*}
    \centering
    \includegraphics[width=\textwidth]{pics/before_after_vaccination_ann10.png}
    \caption{User Interactions before (a) and after (b) Public Vaccination in Iran. Red nodes showing anti-vaccinations, green ones for pro-vaccination, blue nodes representing neutral, mixed users are shown in gray, and black ones demonstrating unclassified users.} 
    \label{fig:before-after-vaccination}
\end{figure*}



According to our evaluations, before June 2021, anti-vaccination users constituted 7.11\% of all users, while they formed only 4.73\% after that time. Likewise, pro-vaccination members accounted for 12.44\% before June 2021, whereas they made up for 9.82\% afterward. These two trends disagree with what we have observed for the influencers, meaning that normal users from both sides of the argument became less fixated on their positions and, on average, decided to either post fewer amounts of content or take relatively neutral stands toward the vaccination.

Table \ref{tab:networks-measures} shows the overall statistics of both networks. Based on the in-degree and density measures, we observed that users tend to receive fewer mentions and replies after the public vaccination compared to the previous period. Similarly, the rate of contribution to vaccine-related tweets decreased. These results show that after public vaccination and its observable effects, the level of Covid reactions in tweets decreased significantly. Nevertheless, the rate of top influencers (denoted with a large diameter) increased, especially for anti-vaccination and pro-vaccination users, showing that logical discussions among prominent members enhanced.

\begin{table}[ht] 
\caption{User Interaction Network Measures} 
\centering
    \begin{tabular}{|c|c|c|}
    \hline\hline
    Measures & \  BV & \   AV \\ [0.25ex] 
    \hline
    Average in-degree & 25.88 & 21.52 \\
    Clustering Coefficient & 0.237 & 0.230 \\
    Density & 0.107 & 0.062 \\
    Homophily & 0.083 & -0.097 \\
    Average Path Length & 2.07 & 2.41 \\
    \hline\hline 
\end{tabular} 
\label{tab:networks-measures} 
\end{table}

Furthermore, consistent with the attachment of similar nodes demonstrated with homophily measure, we observed that before vaccination, the argument between those with similar thoughts, affected by the influencers such as news accounts, was bold. On the other side, as the vaccination brought about healing outcome and side-effects, the controversy among different groups with different viewpoints was raised after the vaccination.

\section{Conclusion}
\label{sec:conclusion}

In this study, using a keyword-based method, we extracted Covid-19-related tweets and performed a topic modeling to specify the main subjects discussed around the Covid-19 matter. Utilizing the topic modeling results combined with a keyword-based search, we achieved vaccine-related tweets during Coronavirus pandemics up to the end of 2021 in Iran. Later, we classified vaccine-related tweets into vaccine-critical, neutral, and vaccine-supportive groups and extracted the main themes discussed around the Covid-19 vaccination.

Moreover, we carried out a happiness and anger analysis to further evaluate public opinion toward vaccination. Afterwards, we performed a range of analyses to assess how users reacted to the evolution of vaccines for the Covid-19. The results demonstrate the immense potential of online platforms to provide insight into people's reactions to crisis and how their behavior evolves. Although utilizing data from such platforms to understand Covid-19's public response has been explored to a certain degree, this study is among the first to address the issue in the Persian language. The future work can be attributed to the directions of a more comprehensive analysis of network properties and structures, such as community detection, to have a richer understanding of influential users and their connections. Furthermore, we did not segregate real accounts from fake users and bots. An accurate methodology to exclude bots from the user base would be beneficial for more robust insights into user behavior. Another important topic related to the bots is their influence in steering society's way of thinking about vaccination and social matters in general. Studying their presence, attributes or features that separates them from normal users, and the content they're spreading can be explored in future so that more cohesive and reliable content can be handed over the people searching for information.















