\section{Introduction}

In recent years, recommendation algorithms on social platforms have greatly enhanced confirmation bias by showing users content that is the most susceptible to match their interests --- the so-called \emph{filter bubble} effect \citep{pariser}. As a consequence, more and more isolated, tightly clustered online communities of similar-minded individuals, often referred to as \emph{echo chambers}, have arisen in various domains such as politics \citep{cota2019,delvicario2017,garimella2018}, healthcare \citep{allington2020,health_polarisation,monsted2022} or science \citep{williams2015}. Because of the so-called \emph{backfire effect}, presenting these users with opposing information might have the adverse effect of reinforcing their prior beliefs \citep{bail2018,schaewitz2020}. Finding ways to prevent such polarisation of opinion is a great challenge in the actual world. This paper is a step in this direction, as we propose ways of maximising the diversity of beliefs as well as the exposure to adverse views in a social group. 

To this end we rely on the well-known voter model, in which each user holds one of two possible opinions (e.g.\ liberal of conservative, pro or anti-abortion) and updates it randomly under the distribution of others' beliefs. Independently introduced by \cite{clifford_sudbury} and \cite{holley1975} in the context of particles interaction, this model has since been used to describe in a simple and intuitive manner social dynamics where people are divided between two parties and form their opinion by observing that of others around them. We assume some of the users are stubborn and never change opinion. We call them \emph{zealots} as in \cite{mobilia2003,mobilia2007}. They can represent lobbyists, politicians or activists for example. Long time dynamics and limiting behaviour of such processes have been subject to several studies \citep{mobilia2007,mukhopadhyay2020,binary_opinion}. 

To achieve our goal we propose equilibrium formulas for both the opinion diversity $\sigma$ and density of active links $\rho$, which is the proportion of connections that join opposite-minded users. The former is based on earlier results from \cite{masuda2015}. The latter uses a mean-field approximations and we show that it performs well when compared to numerical simulations. We then study the problems of maximising these quantities by turning \emph{free} (\emph{i.e.}\  non-zealous) users into zealots under the presence of a backfire effect. This effect we model by assuming that any increase in the number of zealots entails the \emph{radicalisation} of some non-zealous users, turning them into zealots with the opposite opinion. We provide exact solutions in the specific case of a complete, unweighted network for both the problems of $\sigma$ and maximising $\rho$. For $\sigma$ we also propose a method to optimise it in general networks. 

Finally we apply our findings on a real-life dataset. Namely, we study the evolution of the composition of the US House of Representatives since 1947. We assimilate it to a realisation of the voter model and estimate the corresponding quantity of zealots based on empirical values of the equilibrium metrics $\sigma$ and $\rho$. We then solve our maximisation problems in this case and find that maximising $\rho$ by acting on Democrat zealots can help increase both $\rho$ and $\sigma$. 

All code used is available online\footnote{\url{https://github.com/antoinevendeville/howopinionscrystallise}}.

\section{Related Literature}
Perhaps the earliest milestone in the study of opinion dynamics are the works from \cite{french} and \cite{degroot} who studied how a society of individuals may or may not come to an agreement on some given topic. Assuming the society is connected and people repeatedly update their belief by taking weighted averages of those of their neighbours, they showed that consensus is reached. That is, everyone eventually agrees. Various other models have been developed since, to tackle the question of under which circumstances and how fast a population is able to reach consensus. Amongst others, \cite{friedkin_johnsen} introduce immutable innate preferences, \cite{axelrod} studies the effect of homophily, \cite{word_of_mouth} assume individuals are perfectly rational and \cite{nbsl} account for the influence of external events. 

The voter model was introduced independently by \cite{clifford_sudbury} and \cite{holley1975} in the context of particles interaction. They proved that consensus is reached on the infinite $\mathbb{Z}^d$ lattice. Several works have since looked at different network topologies, wondering whether consensus is reached, on which opinion and at what speed. Complete graphs \citep{yehuda2002,sood2008,perron2009,yildiz2010}, Erdös-Rényi random graphs \citep{sood2008,yildiz2010}, scale-free random graphs \citep{sood2008,fernley2019}, and other various structures \citep{sood2008,yildiz2010} have been addressed. Variants where nodes deterministically update to the most common opinion amongst their neighbours have also been studied \citep{chen2005,mossel2013}.

An interesting case to consider is the one where zealots -- \emph{i.e.}\  stubborn agents who always keep the same opinion, are present in the graph. Such agents may for example represent lobbyists, politicians or activists, \emph{i.e.}\  entities looking to lead rather than follow and who will not easily change side. One of those placed within the network can singlehandedly change the outcome of the process \citep{mobilia2003,sood2008}. If several of them are present on both sides, consensus is usually not reachable and instead opinions converge to a steady-state in which they fluctuate indefinitely \citep{mobilia2007,binary_opinion}.

Recently, \cite{mukhopadhyay2020} considered zealots with different degrees of zealotry and proved that time to reach consensus grows linearly with their number. They also showed that if one opinion is initially preferred --- \emph{i.e.}\  agents holding that opinion have a lesser probability of changing their mind --- consensus is reached on the preferred opinion with a probability that converges to 1 as the network size increases. \cite{klamser2017} studied the impact of zealots on a dynamically evolving graph, and showed that the two main factors shaping their influence are their degrees and the dynamical rewiring probabilities. 

With the increasing importance of social networks in the political debate and information diffusion, there has been a recent surge in research aiming at controlling opinions, often with the goal to reduce polarisation. With the Friedkin-Johnsen model, \cite{goyal2019} provide algorithms for selecting an optimal sets of stubborn nodes in order to push opinions in a chosen direction. \cite{yi2019disagreement} formulate different constrained optimisation problems under the French-Degroot and the Friedkin-Johnsen models. They provide solutions in the form of optimal graph construction methods. 

Still within the Friedkin-Johnsen paradigm, \cite{chitra2020} prove that dynamically nudging edge weights in the user graph can reduce polarisation while preserving relevance of the content shown by the recommendation algorithm. \cite{garimella2017bis} propose a method to reduce polarisation through addition of edges in the network. The focus is put on which nodes to connect in order to get the best reduction in polarisation, while being sure that the edge is ``accepted'' --- as extreme recommendations might not work because of the backfire effect. Finally, \cite{cen2020} propose a data-driven procedure to moderate the gap between opinions influenced by a neutral or a personalised newsfeed. Importantly, they show that this can be done even without knowledge of the process through which opinions are derived from the newsfeed. 

Of particular interest to us, \citep{binary_opinion,masuda2015,moreno2021} study the voter model and propose strategies to find optimal sets of zealots in order to push opinions in a chosen direction. This work places itself in a similar vein but the objective is different, as we are trying to adjust the balance between both opinions rather than promoting one of the two.

\paragraph{Our contribution}

In a previous work \citep{vendeville2022} we studied the voter model with zealots in connected graphs with arbitrary degree distribution. Extending a result from the literature, we proved that the expected average opinion $\bar{x}^*$ of the population at equilibrium is given by the proportion of opinion 1 amongst zealots. Furthermore we solved the problem of controlling $\bar{x}^*$ via injection of zealots in the presence of a backfire effect.

In the present paper, we turn ourselves to the case where the network is weighted, directed and not necessarily connected. The vector $x^*$ of individual average opinions at equilibrium is then given by the solution of a linear system \cite[eq.~(4)]{masuda2015}. We adapt the problem of controlling $\bar{x}^*$ under backfire effect to that of controlling diversity, that we define as a function of $\bar{x}^*$. We show that it can be solved efficiently by gradient descent. This approach however does not guarantee the existence of a dialogue between users, as even with $\bar{x}^*\approx 1/2$ the network might clusterise into hermetic echo chambers with opposite opinions. Thus we suggest a novel, alternative approach for the diversification of opinion in social networks. Instead of controlling the average opinion, we propose to control the density of active links -- \emph{i.e.}\  the proportion of edges that connect users with different opinions. 

With the profusion of theoretical works and models on opinion dynamics in recent years, the need for real data validation has got more and more pressing. An attempt to fit the voter model with election results in the UK and in the US was the object of a previous publication from us \citep{vendeville2020}. In the present paper we illustrate our findings as we apply the developed methods to the evolving network of the House of Representatives in the United States. With the gradual disappearance of independent members and the fading of cross-party agreement, it has become a prime example of a polarised network divided in two antagonistic camps \citep{andris2015}. We find that the network exhibits high levels of opinion diversity but lower levels of active links density. In this context we solve the optimisation problems developed in the theoretical sections, providing optimal numbers of zealots that maximise either $\sigma$ or $\rho$. We find that both can be increased in some cases.

\section{The Voter Model with Zealots} \label{voter_model} 
In the traditional voter model, users are placed on the $\mathbb{Z}^d$ lattice and hold individual opinions in $\{0,1\}$. Given an initial distribution of opinion, each user updates their opinion at the times of an independent Poisson process of parameter 1 by copying a neighbour chosen uniformly at random. Letting $x_i(t)$ denote the opinion of user $i$ at time $t$, we say that consensus is reached if almost surely all users eventually agree, \emph{i.e.}\  if
\begin{equation} \label{consensus}
	\forall i,j, \quad \mathbb{P} \left(x_i(t) = x_j(t) \right) \underset{t \rightarrow \infty}{\longrightarrow} 1.
\end{equation}
On any finite connected network, consensus is reached \citep{aldous_fill_2014}. Intuitively, no matter the current number of opinion-0 and opinion-1 users, there exists a succession of individual opinion changes with strictly positive probability that results in everyone holding the same opinion. 

It is might however seem unrealistic to imagine that all people in a group are willing to change opinions. An interesting extension of the traditional voter model is to include stubborn agents who never change their opinions, often referred to as \emph{zealots} \citep{mobilia2007,binary_opinion}. They form an inflexible core of partisans who bear great power of persuasion over the population. If all zealots defend the same opinion then via similar arguments as for eq.~(\ref{consensus}) this opinion is eventually adopted by all. When both camps count such agents within their ranks however, there always exists a strictly positive number of users with each opinion. This prevents consensus and instead the system reaches state of equilibrium in which it fluctuates indefinitely.

\paragraph{Framework}
Although we will consider a complete and unweighted user graph in our application, most of the analysis is presented in the general case of a directed, weighted network. We assume there are $N$ users among which $z$ are zealots. The remaining $F:=N-z$ users are referred to as \emph{free}. The set of free users is denoted by $\mathcal{F}$, the set of zealots with opinion 0 by $\mathcal{Z}_0$, the set of zealots with opinion 1 by $\mathcal{Z}_1$ and the set of all zealots by $\mathcal{Z}:=\mathcal{Z}_0\cup\mathcal{Z}_1$.

For any pair $(i,j)$ of users we let $w_{ij}\ge 0$ be the weight of the directed edge $j\rightarrow i$, representing the power of influence that $j$ has over $i$. If $i\in\mathcal{Z}$ we set $w_{ij}=0$ for all $j$. We do not assume uniform choice anymore and when updating their opinion, $i$ will copy $j$ with probability proportional to $w_{ij}$. We assume $w_{ii}$ to be zero, meaning users cannot choose to copy themselves -- this assumption may be relaxed in the future and we expect the results presented here to hold.

Opinions thus evolve as follows. Assume $i\in\mathcal{F}$ updates their opinion at time $t$ when the vector of opinions is $x(t)$. Then $i$ will adopt opinion 1 with probability $d_i^{-1}\sum_{j=1}^N w_{ij}x_j(t)$ where $d_i$ is the total influence exerted on them, defined by $d_i = \sum_{j=1}^N w_{ij}$. This quantity can be seen as the in-degree of node $i$. Zealots do not update their opinions and receive no external influence, thus they have in-degree 0.

Finally we let $z_0$ and $z_1$ be the $F$-dimensional vectors of zealot influence over free users, where $z_{0,i}=\sum_{j\in\mathcal{Z}_0} w_{ij}$ is the total influence exerted by all zealots with opinion 0 onto user $i\in\mathcal{F}$. The definition of $z_1$ is analog. The in-degree of a free node $i$ can then be written as $d_i = \sum_{j\in\mathcal{F}} w_{ij} +z_{0,i} +z_{1,i}$.



\section{Control of Opinion Diversity} \label{opinion_diversity_section}
We define the average diversity of opinion at equilibrium by
\begin{equation}
	\sigma = 4\bar{x}^*(1-\bar{x}^*)
\end{equation}
where $\bar{x}^*$ is the average opinion over all users at equilibrium, \emph{i.e.}\  the expected result when punctually observing the opinion of a random node. It is also the average share of opinion 1 within the network and often referred to as \emph{magnetisation} in the literature.

$\sigma$ is the variance of the Bernoulli distribution of parameter $\bar{x}^*$, scaled by 4 so that it ranges in $[0,1]$. It describes the diversity of the system in that it is maximal when users are equally divided between both opinions ($\bar{x}^*=1/2$), and minimal when only one opinion is represented ($\bar{x}^*=0$ or $1$).

\subsection{Maximisation in General Networks}
Let $L=(L_{ij})_{i,j\in\mathcal{F}}$ be the Laplacian of the \emph{free} graph, \emph{i.e.}\  the $F\times F$ matrix with elements $L_{ij} = \delta_{ij}\sum_{k\in\mathcal{F}} w_{ik} - (1-\delta_{ij}) w_{ij}$ where $\delta_{ij}$ is the Kronecker delta. \cite{masuda2015} showed that the average opinion amongst free users is:
\begin{equation}
	\bar{x}^*_f=\frac{1}{F}\mathbf{1}^\top[L + \text{diag}(z_0+z_1)]^{-1}z_1,
\end{equation}
where $\mathbf{1}$ is the $N$-dimensional vector filled with ones. The $i^{\text{th}}$ entry of the vector $x_f^*:=[L + \text{diag}(z_0+z_1)]^{-1}z_1$ is the average opinion of $i$ at equilibrium, given by
\begin{equation} \label{xfi}
	x_{f,i}^* = \frac{\sum_{j\in\mathcal{F}}w_{ij}x_{f,j}^*+z_{1,i}}{d_i}.
\end{equation}
Finally we have
\begin{equation}
	\bar{x}^* = \frac{F \bar{x}^*_f + \vert\mathcal{Z}_1\vert}{N}.
\end{equation}

Now consider a network where the set $\mathcal{Z}_0$ of 0-zealots and their influence vector $z_0$ is fixed. Given a predetermined quantity $\mathcal{Z}_1$ of 1-zealots, how should we set the values of $z_1$ to maximise the opinion diversity at equilibrium? Formally, we seek to solve
\begin{align} \label{P}
	\underset{z_1\ge0}{\text{argmax}} \quad &\sigma. \tag{P}
\end{align}
Recall that the objective is function of $\bar{x}^*$ which is itself function of $z_1$. Because $\bar{x}^*$ is increasing with $\Vert z_1\Vert$ and equals zero when $\Vert z_1\Vert=0$, there exists at least one optimal vector $z_1^\star$ for which $\bar{x}^*=1/2$ and thus (\ref{P1}) is solved. This optimal vector can be found efficiently using gradient ascent on $\sigma$, as \cite{moreno2021} show that $\bar{x}^*_f$ (and thus $\bar{x}^*$) is concave with respect to $z_1$.

\subsection{Maximisation in Complete Networks}
In our application, we will consider a complete, unweighted network: $w_{ij}=\mathds{1}_{i\notin\mathcal{Z}}$ for all $i$. In that case all entries of $z_0$ are equal to the same value, which is the amount of 0-zealots nodes within the graph. For the sake of simplicity we denote this unique value by $z_0$. We proceed similarly for $z_1$. In that case it is known \citep{mobilia2007,masuda2015} that
\begin{equation} \label{xbars_complete}
	\bar{x}^* = \bar{x}^*_f = x_{f,1}^* = \ldots = x_{f,{F}}^* = \frac{z_1}{z_0+z_1}.
\end{equation} 
In a previous work \citep{vendeville2022} we proved that this result also holds on expectation for any connected, unweighted graph where the position of zealots is drawn uniformly at random. Hence the following.
\begin{theorem} \label{rho_complete}
	In a complete unweighted user graph with $z_0$ zealots with opinion 0 and $z_1$ zealots with opinion 1,
	\begin{equation}
		\sigma = \frac{4z_0z_1}{(z_0+z_1)^2}.
	\end{equation}
	This quantity is trivially maximal when $z_0=z_1$.
\end{theorem}

In the same paper we studied the following problem: given a quantity $z_0$ of 0-zealots and a target diversity $\lambda$, what is the optimal number $z_1^\star$ of free users that should be turned into 1-zealots in order for $\bar{x}^*$ to be as close to $\lambda$ as possible? This is a generalisation of the diversity maximisation problem presented above which corresponds to the specific case $\lambda=1/2$.

Numerous empirical studies have found that rather incentivising a change in opinion, presenting certain people with opposing views might actually entrench them even deeper in their beliefs. This is often referred to as the \emph{backfire effect}. To account for this phenomenon, we assumed that the creation of $z_1$ zealots with opinion 1 will \emph{radicalise} a quantity $\alpha z_1$ of free users, who will then become 0-zealots. Thus necessarily $z_0+(1+\alpha)z_1\le N$ and the constraint $z_1\le(N-z_0)/(1+\alpha)$. The real parameter $\alpha \in [0,1)$ quantifies the intensity of the backfire effect. 

We quickly summarise our findings as they will be useful to us. We also take this opportunity to correct a small mistake that was found in the paper.
\begin{theorem}
	Assume there are $z_0$ zealots with opinion 0 into the system, and there exists a backfire effect of intensity $\alpha$. Set $z_1^{\textnormal{max}}:=(N-z_0)/(1+\alpha)$. After turning $z_1$ free users into zealots with opinion 1, $z_0$ is updated to $z_0+\alpha z_1$ and the average equilibrium opinion is 
\begin{equation} \label{xbars_complete_formula}
	\bar{x}^* = \frac{z_1}{z_0+(1+\alpha)z_1}.
\end{equation}
The solution to the problem
\begin{align} \label{P1}
	\underset{0\le z_1\le z_1^\textnormal{max}}{\textnormal{argmin}} \quad &(\bar{x}^*-\lambda)^2 \tag{P1}
\end{align}
is given by
\begin{equation} \label{optim_backfire}
	\begin{cases}
		z_1^\star = \textnormal{min} \left(z_1^{\textnormal{max}}, \lambda z_0d^{-1} \right) &\text{ if } d>0,\\
		z_1^\star = z_1^{\textnormal{max}} &\text{ if } d\leq0,
	\end{cases}
\end{equation}
where $d:=1-\lambda-\alpha \lambda$. Discarding the constraint $z_1\le z_1^\textnormal{max}$ results in an unbounded problem if $d\le 0$ and $z_1^\star=\lambda z_0d^{-1}$ if $d>0$. 
\end{theorem}
In our previous work we had mistakenly claimed that $z_0$ was updated to $z_0+\alpha z_0z_1$, leading to $\bar{x}^*=z_1/(z_1+(1+\alpha z_1)z_0)$ and $d=1-\lambda-\alpha \lambda z_0$. All results presented then do still hold qualitatively. Finally note that in general $z_1^\star$ will not be an integer so that in practical cases it would need to be rounded.

In the present context we restrict ourselves to $\lambda=1/2$. The objective function $(\bar{x}^*-1/2)^2$ is equal to $\frac{1}{4}-\sigma$ and the condition $d>0$ becomes $\alpha<1$ which is true by definition. Hence the following theorem.

\begin{theorem}
	The problem of maximising diversity in a complete unweighted graph with $z_0$ 0-zealots and backfire effect $\alpha$ is formally written as
\begin{equation}\label{P2}\tag{P2}
	\begin{aligned} 
	\underset{0\le z_1\le z_1^\textnormal{max}}{\textnormal{argmax}} \quad &\sigma_{z_0,\alpha}(z_1) \\
	\textnormal{s.t.} \quad &\sigma_{z_0,\alpha}(z_1) = \frac{4(z_0+\alpha z_1)z_1}{(z_0+(1+\alpha)z_1)^2} 
\end{aligned}
\end{equation}
where $z_1^{\textnormal{max}}=(N-z_0)/(1+\alpha)$. Its solution is given by
\begin{equation} 
	z_1^\star = \textnormal{min} \left(z_1^{\textnormal{max}}, \frac{z_0}{1-\alpha} \right).
\end{equation}
\end{theorem}




\section{Density of Active Links at Equilibrium} \label{active_links_section}
Because maximising diversity does not guarantee that the network will not clusterise into echo chambers, we are also interested in maximising the proportion of \emph{active links} at equilibrium. A link is said to be active if it joins two users with opposite opinions. We derive a mean-field approximation for this quantity that shows a tight fit with empirical averages obtained via numerical simulations. Let us first precise what we mean by \emph{active links} and their average density. 

We denote by $\mathcal{E}'$ the set of all edges present in the graph that join two users, one of them at least being free. We write $(i,j)$ to designate the edge $j\rightarrow i$ pointing outwards from $j$ and towards $i$. Because the graph is oriented, $(i,j)$ and $(j,i)$ are two separate objects and one might exist without the other. Moreover when both are present in the graph we do not necessarily have $w_{ij}=w_{ji}$. 

\begin{definition}[Active link]
At any time $t$ the directed link $(i,j)\in\mathcal{E}'$ is said to be \emph{active} if $x_i(t)\neq x_j(t)$, and inactive otherwise. 
\end{definition}
\begin{definition}[Average density of active links]
Let $q_{ij}$ be the equilibrium probability of the event $\{x_i\neq x_j\}$. We define the average density of active links at equilibrium by
\begin{equation} \label{rho_def}
	\rho = \frac{1}{\vert\mathcal{E}'\vert} \sum_{(i,j)\in\mathcal{E}'} q_{ij}
\end{equation}
and its weighted version by 
\begin{equation} \label{rhow_def}
	\rho_w = \frac{\sum_{(i,j)\in\mathcal{E}'} w_{ij}q_{ij}}{\sum_{(i,j)\in\mathcal{E}'} w_{ij}}.
\end{equation}
\end{definition}
In the weighted case, heavier edges count more towards the average and lighter ones count less. Note that $q$ is a non-oriented metric in that $q_{ij}=q_{ji}$. When $i$ and $j$ are both free users, $q_{ij}$ will be counted twice in the sum above if both $w_{ij}$ and $w_{ji}$ are positive, once if only one of them is and not counted at all if they are not connected with each other. Because zealots receive no influence from others, if $i$ is in $\mathcal{Z}$ then $w_{ij}=0$ and $q_{ij}$ is counted once if $w_{ji}>0$, not counted otherwise. If $j$ is a zealot as well then $q_{ij}$ is not counted in the sum.

The following theorem is the main theoretical contribution of this paper.

\begin{theorem} \label{rhoij_theorem}
	A mean-field approximation for the values $q_{ij}$ is given by the solution of the following linear system:
\begin{align} \label{rhoij}
	q_{ij}(d_i+d_j) & -\sum_{k\in\mathcal{F}\backslash\{i,j\}} (w_{ik}q_{jk}+w_{jk}q_{ik}) 
	= \tilde{z}_j x^*_i + \tilde{z}_i x^*_j + z_{1,i} + z_{1,j},
\end{align}
where $(i,j)$ describes $\mathcal{E}'$, $\tilde{z}_k := z_{0,k}-z_{1,k}$, $d_k=\sum_{l=1}^N w_{kl}$ is the in-degree of node $k$, $x_k^*:=x_{f,k}^*$ for $k\in\mathcal{F}$, $x_k^*:=0$ for $k\in\mathcal{Z}_0$ and $x_k^*:=1$ for $k\in\mathcal{Z}_1$.
\end{theorem}
The proof can be found in \Cref{proof_section}. Consistent with intuition, if $j\in\mathcal{Z}_0$ (resp.\ $j\in\mathcal{Z}_1$) then the above yields $q_{ij}=\bar{x}^*_{f,i}$ (resp.\ $q_{ij}=1-\bar{x}^*_{f,i}$), which is simply the probability for $i$ to hold opinion 1 (resp.\ opinion 0).  

\subsection{Numerical Validation}
We validate the above through a series of numerical simulations. Let us place ourselves in an Erdös-Rényi random graph with $N=100$ users, $\vert\mathcal{Z}_0\vert=23$ zealots with opinion 0 and $\vert\mathcal{Z}_1\vert=18$ zealots with opinion 1. The graph is directed and we set its density to $0.1$ so that about $10\%$ of all possible edges are present. Each edge is then attributed a weight generated uniformly at random between 0 and 1.

We perform a single simulation of the voter model on this graph for 50,000 time units. The empirical density of active links $\hat\rho$ is computed every 100 updates, starting once 10,000 time units have passed to ensure that the system has had time to stabilise. In \Cref{rho_simu_vs_theo} \emph{(top left)} we plot $\hat\rho$ over the last 1,000 time units against $\rho$. Averaging $\hat\rho$ over time yields our final empirical estimate. We proceed similarly for $\rho_w$ \emph{(top right)} and do it all as well in a Barabasi-Albert graph with weights generated under an exponential distribution of parameter 1 \emph{(bottom left and right)}. This graph has density $\approx 0.1$ as well.

In all cases the theoretical values $\rho$ and $\rho_w$ are roughly the same. Thus for the same density of edges, the topology of the graph does not seem to play an very important role here. We obtain rather small errors between theory and simulation, in the order of $10^{-4}$ for the Erdös-Rényi graph and $10^{-3}$ for the Barabasi-Albert graph. Other experiments with different quantities of zealots, weights distribution and graph topologies have shown similarly small errors as well, confirming that our mean-field approximation of $q$ performs well in practice. The error for the Barabasi-Albert network being higher is not too surprising, as the second graph is inherently less regular and the variance in the weights is higher (exponential distribution of parameter 1 against uniform distribution over $[0,1]$). This can also be seen in the oscillations of $\hat\rho$ and $\hat\rho_w$ over time, which demonstrate more variability by spanning a larger range in this case.

\begin{figure}[t]
	\centering
	\begin{subfigure}[b]{.5\textwidth}
		\includegraphics[width=\textwidth]{ER_weightedFalse_plot}
	\end{subfigure}~
	\begin{subfigure}[b]{.5\textwidth}
		\includegraphics[width=\textwidth]{ER_weightedTrue_plot}
	\end{subfigure}\\
	\begin{subfigure}[b]{.5\textwidth}
		\includegraphics[width=\textwidth]{BA_weightedFalse_plot}
	\end{subfigure}~
	\begin{subfigure}[b]{.5\textwidth}
		\includegraphics[width=\textwidth]{BA_weightedTrue_plot}
	\end{subfigure}
	\caption{Verifying \Cref{rhoij_theorem} in simulation. $N=100$ users, simulation time 50,000. Last 1,000 time units are plotted. \textbf{Top:} Erdös-Rényi graph with random uniform weights. \textbf{Bottom:} Barabasi-Albert graph with random exponential weights. \textbf{Left:} theoretical density of active links as per eq.~(\ref{rho_def}) (dotted red lines) and empirical value over time for a single simulation (blue oscillations). \textbf{Right:} same with weighted density of active links (\ref{rhow_def}). In each plot we also indicate the average over the whole simulation $(\hat\rho$ or $\hat\rho_w)$ as well as the theoretical value $(\rho$ or $\rho_w)$, both rounded to $10^{-4}$.}
	\label{rho_simu_vs_theo}
\end{figure}

\subsection{Maximisation in Complete Networks}
Now consider a complete unweighted graph. Again we let $z_0$ denote the amount of 0-zealots and $z_1$ the amount of 1-zealots in the network. Becomes of the asymmetry between free users and zealots it is convenient to still assume edges to be directed. Then the following holds.

\begin{theorem} \label{rho_complete}
	In a complete unweighted user graph with $z_0$ zealots with opinion 0 and $z_1$ zealots with opinion 1,
	\begin{equation} \label{rhoc_eq}
		\rho =  \frac{2z_0z_1(N-z_0-z_1)}{(N-1)(z_0+z_1)(z_0+z_1+1)}.
	\end{equation}
\end{theorem}
The proof can be found in \Cref{proof_section}. Now we are interested in finding the optimal number $z_1^\star$ of free users that should be turned into 1-zealots in order to maximise $\rho$ (which is equal to $\rho_w$ as the graph is weighted). Because of the backfire effect $\alpha$, creating $z_1$ zealots with opinion 1 with entail a change in the number of 0-zealots, from $z_0$ to $z_0+\alpha z_1$. 

\begin{theorem} \label{optim_rho_complete_theo}
The problem of maximising the density of active links in a complete unweighted graph with $z_0$ 0-zealots and backfire effect $\alpha$ is formally written as
\begin{equation} \label{P3} \tag{P3}
	\begin{aligned}
	\underset{0\le z_1\lez_1^{\textnormal{max}}}{\textnormal{argmax}} \quad &\rho_{z_0,\alpha}(z_1) \\
	\textnormal{s.t.} \quad &\rho_{z_0,\alpha}(z_1) =  \frac{2(z_0+\alpha z_1)z_1}{(z_0+(1+\alpha)z_1)(z_0+(1+\alpha)z_1+1)}
	\end{aligned}
\end{equation}
where $z_1^{\textnormal{max}}=(N-z_0)/(1+\alpha)$. It has at least one solution which is either $z_1^{\textnormal{max}}$ or a real positive root of the derivative.
\end{theorem}

This results directly from the fact that the objective function is continue, one-dimensional, and that the optimum cannot be reached for $z_1=0$ as
\begin{equation}
	 \rho_{z_0,\alpha}(z_1) >0 = \rho_{z_0,\alpha}(0)
\end{equation}
for all $z_1>0$.

\section{Application to US Congress Data}
We evaluate our results on real-life data from American politics. The Voteview dataset \citep{voteview} contains very detailed information about the United States congress since its inception in 1789. Members of each Senate and House of Representatives are listed with their affiliations and what they voted in each rollcall. We discard voting data and simply focus on the composition of the House of Representatives since 1947. During this time, the proportion represented by Democrats and Republicans therein is always superior to 99.5\%, which justifies a binary approach such as the voter model. 

Let $D_k,R_k$ be the respective amounts of Democrat and Republican representatives in the House during the $k^\text{th}$ congress. In 1947 started the $80^\text{th}$ congress and in 2021 the $117^\text{th}$ one so that $k$ would range in $\{80,\ldots,117\}$. For the sake of simplicity however we shift the indices by 79 and let $k\in\{1,\ldots,38\}$. We discard members of other parties from our analysis---they never represent more that 0.5\% of the House. We find ourselves with two vectors $D,R$ of length $K=38$ and assume they correspond to punctual observations of a single realisation of the voter model with zealots on a complete, unweighted graph. 

Because members of the House change between congresses, users cannot represent \emph{persons} here. Rather, they represent \emph{seats} of the House, and their \emph{opinion} is the \emph{party} to which the representative occupying it is affiliated. In an effort for clarity and consistency we employ the traditional words of users and opinions, but it is important to keep in mind what they precisely mean here.

In each congress there is a small number of non-voting delegates. They are included in our analysis but their exact number may vary so that the total number of seats $N_k$ is not always the same. In the congresses considered here, and with non-Democrat, non-Republican members discarded, $N_k$ varies between 438 (3 non-voting delegates) and 453 (18 non-voting delegates).

\subsection{Parameters Estimation}
The quantity of zealots on each side is unkown to us. They represent the quantity of seats that are ``locked'' by each party and we denote these by $z_D$ (Democrat zealots) and $z_R$ (Republican zealots). To infer their number, one could for example consider as zealots users who most often agree with others from their party. This would require however the choice of aheuristic threshold to define what counts as ``most''. Moreover, we do not have the vote of every member for each rollcall, entailing the addition of uncertainty and potential errors. Rather, we choose to infer $z_D$ and $z_R$ from the two equilibrium metrics presented in sections \ref{opinion_diversity_section} and \ref{active_links_section}: $\sigma$ and $\rho$. 

We propose the following estimate $\hat z=(\hat z_D, \hat z_R)$ for $(z_D,z_R)$:
\begin{equation} \label{zhat} \tag{Q}
	\begin{aligned} 
		\hat z = \underset{(z_D,z_R)\in Z}{\text{argmin}} \quad &\epsilon \\
		\text{s.t.} \quad &\epsilon=\frac{\vert\hat\sigma-\sigma\vert + \vert\hat\rho-\rho\vert}{2}, 
	\end{aligned}
\end{equation}
where $\hat\sigma$ is an empirical estimate of $\sigma$ and $\hat\rho$ is an empirical estimate of $\rho$. Thus $\hat z$ is a minimiser of the mean distance between theoretical and empirical values of our metrics. The set $Z:=\{1,\ldots,D_\text{min}\}\times\{1,\ldots,R_\text{min}\}$ constrains the number of zealots in each party to never be higher than the quantity of members this party has in the House. 

The empirical diversity of opinions is directly derived as
\begin{equation} 
	\hat\sigma := \frac{4}{K}\sum_{k=1}^K \frac{D_kR_k}{(D_k+R_k)^2}
\end{equation}
where $K=38$ is the total number of observations. The empirical estimation of $\rho$ is a bit trickier, as links are considered differently whether they join two free users, two zealots or a free user and a zealot. Hence we need be aware of what nodes are free and which ones are zealous. The empirical density of active links is given by
\begin{equation} 
	\hat\rho := \frac{1}{K}\sum_{k=1}^K \frac{2D_kR_k-D_kz_R-R_kz_D}{N_k(N_k-1)}.
\end{equation}
$N_k$ is the number of nodes in the $k^{th}$ observation, so that the denominator is the total number of links in the corresponding graph. The numerator of the fraction was simplified from
\begin{equation}
	2(D_k-z_D)(R_k-z_R)+(D_k-z_D)z_R+(R_k-z_R)z_D
\end{equation}
where the first term is the number of links between all free nodes, the second term the number of links between free Democrats and zealous Republicans, and the third term the number of links between free Republicans and zealous Democrats. 

The cardinal of $Z$ is small enough that (\ref{zhat}) can be solved by performing an exhaustive search over the whole set. We find:
\begin{align*}
	(D_\text{min},R_\text{min}) &= (190,143), \\
	(\hat z_D, \hat z_R) &= (89, 63), \\
	(\hat\sigma,\hat\rho) &\simeq (0.97, 0.32), \\
	\epsilon &\simeq 3.8\cdot 10^{-5}.
\end{align*}
The small error $\epsilon$ guarantees that the optimisation was efffective. The diversity is close to the theoretical optimal, indicating that the number of members from both parties is balanced over time. The majority switches back and forth between Democrats and Republicans but neither seem to truly have an upper hand over time. The density of active links is fairly high but could be better, as it pales in comparison to the values we obtained during simulations on synthetic networks (\Cref{rho_simu_vs_theo}). 

\subsection{Optimising Diversity and Activity}
In this section, we investigate how $\sigma$ and $\rho$ could be optimised by changing the number of zealots. To this extent, we solve problems (\ref{P2}) and (\ref{P3}) for intensities of the backfire effect $\alpha$ spanning the whole range $[0,1)$. In a first time we consider that Democrats correspond to opinion 0, Republicans to opinion 1 and we optimise on $z_D$ keeping $z_R=\hat z_R$ fixed. Doing this we obtain a optimal number $z_D^\star$ of zealous Democrats when the number of zealous Republicans is given by the one inferred from the data. In a second time we do the opposite and optimise on $z_R$ keeping $z_D=\hat z_D$ fixed, obtaining a maximiser $z_R^\star$.

The two plots on the left part of \Cref{data_optim} describe the solution of (\ref{P2}) function of $\alpha$. The upper plot contains the values of the maximiser $z_D^\star$, its upper-bound $z_D^\text{max}$ and the empirical number of zealots $\hat z_D$ inferred from the data. Similarly for the Republican party with $z_R^\star$, $z_R^\text{max}$ and $\hat z_R$. The bottom plot contains the optimal values $\sigma(z_D^\star)$ and $\sigma(z_R^\star)$ of the objective, when optimising respectively on $z_D$ and on $z_R$. Its empirical value inferred from data $\hat\sigma$ is also represented. In addition we also show $\rho(z_D^\star)$, $\rho(z_R^\star)$ and $\hat\rho$. This is to assess what effect optimising the diversity has on the active link density.

We observe that for values of $\alpha$ up to about $0.7$, $\sigma$ can be fully optimised to its maximum value 1 whether we act on Democrat or Republican zealots. Even for the highest values of $\alpha$ it is very close to the empirical value. The impact of optimising $\sigma$ on $\rho$ is rather negative, as for almost all $\alpha$ the resulting density of active links is lower than its empirical value. Looking at the upper plot, we see that the maximisers are always above empirical values, meaning zealots would have to be added in order to improve diversity. Starting from $\alpha\simeq 0.6$ (Republicans) and $\alpha\simeq 0.7$ (Democrats), the maximiser is equal to its maximum possible value so that the system is saturated, with all nodes being zealots.

We now turn ourselves to the problem of maximising the density of active links (\ref{P3}). Its solution is described by the two plots on the right part of \Cref{data_optim}. Unlike before, here the optimal number of zealots decreases with $\alpha$, is constantly lower than its empirical estimate and never approaches its upper bound. Morevoer this time, optimising $\rho$ can also entail an improvement on $\sigma$. This happens when the number of Democrat zealots is acted upon, and for $\alpha\le 0.5$ roughly. 

Finally, we remark that acting upon Democratic zealots is always more effective than upon Republican ones. This might stem from the fact that the former are in superior number from our empirical estimation, leaving more room to act efficiently on the objective functions.


\begin{figure}[t]
	\centering
	\begin{subfigure}[b]{\textwidth}
		\includegraphics[width=\textwidth]{data_argoptim}
	\end{subfigure}\\
	\vspace{.2cm}
	\begin{subfigure}[b]{\textwidth}
		\includegraphics[width=\textwidth]{data_optim}
	\end{subfigure}
	\caption{Optimal opinion diversity \textbf{(left)} and active links density \textbf{(right)} function of the backfire effect $\alpha$. \textbf{Top:} maximisers compared with empirical estimates and maximum possible values. \textbf{Bottom:} objective values and impact on the other metric, compared with empiriucal estimates.}
	\label{data_optim}
\end{figure}


\section{Proofs} \label{proof_section}

\begin{proof}[Proof of \Cref{rhoij_theorem}.]
Let $\lambda_{ij}$ be the average rate at which user $i$ adopts the same opinion as $j$ while in equilibrium. Remember that each user updates their opinion at the times of an exponential clock of parameter 1. There are four different events that lead to $i$ adopting $j$'s opinion, described below with the associated frequency rates. 
\begin{itemize}
	\item $i$ may copy $j$ directly, which happens at rate $d_i^{-1}w_{ij}$, or
	\item $i$ may copy a third free user $k$ holding the same opinion as $j$, which happens at rate $d_i^{-1}\sum_{k\in\mathcal{F}\backslash\{i,j\}} w_{ik}(1-q_{jk})$, or
	\item $i$ may copy a 1-zealot while $j$ has opinion 1, which happens at rate $d_i^{-1}z_{1,i}x^*_j$,
	\item $i$ may copy a 0-zealot while $j$ has opinion 0, which happens at rate $d_i^{-1}z_{0,i}(1-x^*_j)$. 
\end{itemize}
By using $q_{jk}$ and $x^*_j$ we made the mean-field assumption that $i$ interacts with the average system at equilibrium rather than with its exact state. Through comparison with simulations we will show that this approximation performs well numerically. Putting it all together,
\begin{align}
	\lambda_{ij} &= d_i^{-1} \left(w_{ij} + \sum_{k\in\mathcal{F}\backslash\{i,j\}} w_{ik}(1-q_{jk}) + z_{1,i}x^*_j + z_{0,i}(1-x^*_j)\right).
	\intertext{Via an analogous reasoning, at equilibrium $i$ adopts the opinion opposite of $j$'s with rate}
	\mu_{ij} &= d_i^{-1} \left(\sum_{k\in\mathcal{F}\backslash\{i,j\}} w_{ik}q_{jk} + z_{1,i}(1-x^*_j) + z_{0,i}x^*_j \right).
\end{align}
We obtain $\lambda_{ji}$ and $\mu_{ji}$ in a similar fashion. The discrete quantity $\mathds{1}_{x_i\neq x_j}$ describes a continuous-time Markov chain with two states 0 and 1, transitioning from 0 to 1 with rate $\mu_{ij}+\mu_{ji}$ and from 1 to 0 with rate $\lambda_{ij}+\lambda_{ji}$. The stationary probability of state 1 is exactly $q_{ij}$, so that
\begin{equation}
	q_{ij} = \frac{\mu_{ij}+\mu_{ji}}{\lambda_{ij}+\mu_{ij}+\lambda_{ji}+\mu_{ji}}.
\end{equation}
After simplifications we obtain eq.~(\ref{rhoij}). 
\end{proof}

\begin{proof}[Proof of \Cref{rho_complete}.]
Let $(i,j)\in\mathcal{E}'$. First if $i\in\mathcal{F}$ is free and $j\in\mathcal{Z}_0$, then as discussed before eq.~(\ref{rhoij}) immediately yields $q_{ij}=\bar{x}^*$. If $j\in\mathcal{Z}_1$ then $q_{ij}=1-\bar{x}^*$ and in both cases, $q_{ji}=0$.

Now assume $i,j\in\mathcal{F}$. Because all free nodes are topologically equivalent, they share the same value $q_f$ for $q$---just as they have the same average opinion $\bar{x}^*=z_1/(z_0+z_1)$, \emph{c.f.}\  (\ref{xbars_complete}). Replacing edge weights with 1, in-degrees with $N-1$ and zealots influence $z_{0,k},z_{1,k}$ with $z_0,z_1$, equation~(\ref{rhoij}) becomes
\begin{equation}
	2q_f(N-1)-(F-2)2q_f = 2(z_0-z_1)\bar{x}^* + 2z_1 
\end{equation}
and after simplifications
\begin{equation}
	q_f = \frac{2z_0z_1}{(z_0+z_1)(z_0+z_1+1)}.
\end{equation}
Because there are $F(F-1)$ directed edges between zealots, $Fz_0$ edges between free users and 0-zealots and $Fz_1$ edges between free users and 1-zealots, the total \textbf{number} (and not density) of active links is given by
\begin{align}
	n_{\text{active}} &= F(F-1)q_f + Fz_0\bar{x}^* + Fz_1(1-\bar{x}^*).
	\intertext{Replacing $F$ by $N-z_0-z_1$ and $q_f,\bar{x}^*$ by their respective values we find}
	n_{\text{active}} &= \frac{2z_0z_1N(N-z_0-z_1)}{(z_0+z_1)(z_0+z_1+1)}.
\end{align}
Finally there are $N(N-1)$ directed edges in the complete graph so that we immediately obtain (\ref{rhoc_eq}) via $\rho=n_{\text{active}}/(N(N-1))$.
\end{proof


\section{Conclusion and Future Work} \label{futurework}
In this paper we analysed the voter model with zealots on directed, weighted networks. We proposed formulas for the opinion diversity ($\sigma$) as well as for the density of active links ($\rho$) at equilibrium. The latter relied on a mean-field approximation that we showed performs well against numerical simulations. For both metrics we studied the problem of maximising it by turning free (\emph{i.e.}\  non-zealous) users into zealots in the presence of a backfire effect. We provided explicit solutions for the specific case of a complete unweighted network, and for opinion diversity we also exposed how it could be maximised in the general directed, weighted case.

As an example application, we applied our findings to a dataset detailing the evolution of members in the US House of Representatives since 1947. Assuming the data was a realisation of the voter model with zealots, we estimated the number of zealots by minimising the distance between empirical and theoretical values of the equilibrium metrics $\sigma$ and $\rho$. The opinion diversity was found to be almost maximal, indicating a balanced mix of Democrats and Republicans. We then used the optimisation problems exposed in the theoretical sections to find optimal quantities of zealots users maximising $\sigma$ and $\rho$. Of note, we found that maximising $\rho$ by acting on Democrat zealots can help increase both $\rho$ and $\sigma$.

There are many open leads for further investigation. First, we considered multiple congresses at once. It could be interesting to subdivise in several windows, separated by impactful historical moments (fuel crisis in the seventies, end of the USSR in the early nineties, etc.). There might be patterns inherent to specific periods that are not apparent in our analysis. Data from online social networks could provide interesting examples of polarised systems.

On the theoretical side, an efficient algorithm for the optimisation of $\rho$ on directed, weighted networks could help study more refined data. In the case of social media data, it could also be a good idea to optimise not on the number of zealots but on the edge weights. This would mean standing from the point of view of a platform administrator, trying to update its recommendation algorithm in order to improve opinion diversity or active links density. Such approaches have been tried in other models of opinion dynamics \citep{chitra2020,santos2021}. Finally because we are studying polarised systems, incorporating signed edges in the model could also yield more informative results \citep{keuchenius2021}. 

\justify{
\subsubsection*{Data Availability}
The data used in the application is taken from \cite{voteview}. All code used and simulation data are available online at \url{https://github.com/antoinevendeville/howopinionscrystallise}.}

\justify{
\subsubsection*{Acknowledgements}
The authors have no competing interests to declare. This project was funded by the UK EPSRC grant EP/S022503/1 that supports the Centre for Doctoral Training in Cybersecurity delivered by UCL's Departments of Computer Science, Security and Crime Science, and Science, Technology, Engineering and Public Policy.}

\bibliographystyle{abbrvnat
