\section{Experiments}
\label{sec:experiments}

This section critically assesses our proposed policies, OPTUENT-EXP and OPTUENT-OPT, which strategically select the next vertex to label based on the frameworks established in Eq. (\ref{eq15}) and Eq. (\ref{eq16}), respectively. This evaluation highlights the effectiveness and robustness of our approaches in optimizing the labeling process.

\begin{table}[t]
\centering
\caption{ Statistics of the Datasets} 
\begin{tabular}{c|c|c|c}
\hline
\textbf{Dataset} & \textbf{\#Vertex} & \textbf{\#Pos} & \textbf{\#Neg} \\ \hline

Cora & 2708  & 1296  & 1412   \\ \hline
Citeseer & 3312  & 1618  & 1694   \\ \hline
Pubmed & 19717 & 7875 & 11842 \\ \hline
WebKB & 877 & 415 & 462 \\ \hline
\end{tabular}
\label{table: Datasets Statistics}
\end{table}

\subsection{Dataset and Evaluation Metrics}
\label{dataset}

The performance of our proposed policies is assessed across four distinct graph datasets. Three of these datasets, Cora, Citeseer, and Pubmed~\citep{bojchevski2017deep}, are well-established citation networks, while WebKB~\citep{craven1998learning} comprises web pages from various computer science departments at universities. We adhere to the guidelines outlined by~\citep{pmlr-v216-kulkarni23a} to transform these datasets into binary-class formats. The statistics of the datasets are provided in Table \ref{table: Datasets Statistics}. To evaluate the effectiveness of instance labeling, we utilize \textit{accuracy} as our performance metric.

\subsection{Experimental Settings}
\label{experimental_settings}

Our goal is to leverage label correlations to substantially reduce labeling costs. To showcase the effectiveness of our proposed strategies, we specifically concentrate our experiments on low-budget scenarios. This focus highlights the potential of our methods to deliver impactful results even in resource-constrained environments.

We simulate worker labeling behavior across the four datasets: Cora, Citeseer, Pubmed, and WebKB. To begin, we establish the parameter $\theta_{v_i}$ for all $v_i \in V$, after which worker labels are generated according to the distribution $y_{{v_{i}}} \sim Bernoulli (\theta_{{v_{i}}})$. We perform experiments under two distinct settings: one with fixed values of $\theta_{v_i}$ set at 0.65, 0.7, 0.75, 0.8, and 0.85 for all $v_i \in V$, and the other with $\theta_{v_i}$ sampled from a uniform distribution $\mathcal{U}(0.7, 0.85)$. Due to space constraints, this paper primarily presents results for the fixed setting of $\theta_{v_i} = 0.65$ and the uniform sampling from $\mathcal{U}(0.7, 0.85)$. A detailed discussion of the results for the other fixed values of $\theta_{v_i}$ (0.7, 0.75, 0.8, and 0.85) can be found in Appendix \ref{sec:different_theta}.

At each timestamp, we train a new random forest regression (RFR) model, continuously updating the training data with each newly acquired label from crowd workers. We rigorously evaluate the performance of the RFR model, with detailed results presented in Appendix \ref{sec:rfr}. According to Eq. (\ref{eq15}) and Eq. (\ref{eq16}), our goal is to compute the reward for all vertices in the graph and select the one with the highest reward. 
Given the computational cost of belief propagation, we follow the strategy in~\citep{pmlr-v216-kulkarni23a} by uniformly sampling 10 candidate vertices at each timestamp to compute rewards and select the optimal vertex. As shown in Appendix~\ref{sample_size}, increasing the sample size beyond 10 offers minimal accuracy gains, confirming that small candidate sets are sufficient for robust performance.
We conduct experiments using three random seed values, 11, 42, and 111, and report the mean results for clarity and robustness, alongside standard deviations presented in Figure \ref{fig:main_plot_stdev} in the Appendix. All experiments are performed on a single Nvidia GeForce RTX 3060 GPU, ensuring efficient computation and reliable performance assessments. 

\subsection{Baseline Methods}
\label{baseline_methods}

The reward functions of both OPTKG~\citep{chen2013optimistic} and KG~\citep{frazier2008knowledge} indicate that annotating an unlabeled node is always preferable to annotating a node with one label, which in turn is better than annotating a node with two labels, regardless of the specific labels obtained. Once all nodes receive two labels, OPTKG and KG adopt different selection strategies. As shown in~\citep{chen2013optimistic}, these differences emerge when the budget is sufficiently large; for instance, in a simulation with 50 instances, the methods diverge when the budget reaches 3K, which is 60 times the number of instances. However, since the budget for all our experiments is lower than two times the number of instances, both policies behave similarly. Thus, we focus solely on comparing our proposed policies with OPTKG. Overall, our comparisons include the following policies:

\begin{enumerate}
    \item \textit{Uniform:} This policy randomly samples one vertex from $V$ at each timestamp to obtain worker labels.
    \item \textit{OPTKG}:  The Optimistic Knowledge Gradient policy~\citep{chen2013optimistic} treats each instance as independent and identically distributed (i.i.d.) and selects the instance with the highest optimistic reward at each timestamp. The reward is defined as the change in the marginal probabilities of vertices between two timestamps, reflecting a proactive approach to label acquisition. 
    \item \textit{GraphOBA-EXP}: As defined by~\citep{pmlr-v216-kulkarni23a}, this policy calculates the reward based on the change in the sum of posterior probabilities of vertices in the graph between two timestamps. GraphOBA-EXP selects vertices that maximize expected rewards at each timestamp, relying on belief propagation to effectively disseminate labeling information throughout the graph.
    \item \textit{GraphOBA-OPT}: GraphOBA-OPT, also proposed by~\citep{pmlr-v216-kulkarni23a}, chooses the next vertex based on the optimistic expected reward. Like GraphOBA-EXP, it incorporates belief propagation as a critical component for an effective labeling strategy.
\end{enumerate}

\begin{figure*}[t]
    \centering
    \includegraphics[width=\textwidth]{Figures/main_plot.pdf}
    \caption{Performance comparison on four graph datasets. The top four plots show the performance comparison of OPTUENT-OPT and OPTUENT-EXP with the baselines following scenario 3 for a fixed $\theta_{v} = 0.65$, and the bottom four plots show the performance comparison for $\theta_{v}$ sampled from the uniform distribution $\mathcal{U}(0.7, 0.85)$.}
    \label{fig:main_plot}
\end{figure*}

We evaluate our proposed policies against baseline policies across three scenarios: (1) \textbf{Without BP and RF:} In this scenario, we compare the Uniform and OPTKG policies directly with our proposed methods, providing a clear baseline without any enhancements (2) \textbf{With BP and Without RFR:} In this scenario, we utilize belief propagation (BP) to disseminate labeling information for the Uniform and OPTKG policies, allowing us to assess the impact of BP without the influence of Random Forest Regression (RFR), (3) \textbf{With BP and RFR:} In this scenario, we incorporate RFR alongside BP for both the Uniform and OPTKG policies, while applying RFR for the GraphOBA-EXP and GraphOBA-OPT policies. This setup represents a comprehensive evaluation of how our methods perform with the full capabilities of BP and RFR.

Due to space constraints, we present the findings for scenario 3 in the main paper, while the results for scenarios 1 and 2 are detailed in Appendix \ref{sec:scenario_1_and_2}. This structure allows us to clearly delineate the effectiveness of our proposed approaches across varying conditions.

\subsection{Results and Discussion}
\label{results_and_discussion}
In low-budget settings, all methods exhibit higher variance due to limited initial information, a known challenge in MDP-based sequential decision making. However, our entropy-based selection maintains greater stability in label acquisition compared to greedy baselines, as evidenced by standard deviation plots (Appendix Figure \ref{fig:main_plot_stdev}).
Figure \ref{fig:main_plot} presents a compelling comparison of the proposed policies, OPTUENT-OPT and OPTUENT-EXP, against baseline methods in scenario 3 across the WebKB, Cora, Citeseer, and Pubmed datasets. The analysis includes two settings for $\theta_{v}$: one fixed at 0.65 and the other sampled from a uniform distribution $\mathcal{U}(0.7, 0.85)$, for $v \in V$. For brevity, we omit the subscript $i$ from the vertices. The results reveal that the Uniform policy, which samples vertices randomly, and the OPTKG policy, which selects vertices in a round-robin manner, perform the weakest among the baseline approaches. In contrast, the GraphOBA-OPT and GraphOBA-EXP policies, which leverage posterior probabilities for vertex selection, demonstrate improved performance. However, our proposed policies, which take into account both the posterior probabilities of vertices and the marginal probabilities of edges, surpass all baseline methods. These findings underscore the importance of accurately estimating instance correlations, which can vary from edge to edge. By effectively capturing these dynamics, our approach significantly reduces data labeling costs, highlighting its practical advantages in real-world applications.

In the setting where $\theta_{v}$ is fixed at 0.65, individual workers can theoretically achieve an accuracy of 0.65 after repeated labeling. In contrast, when $\theta_{v}$ is sampled from a uniform distribution $\mathcal{U}(0.7, 0.85)$, the expected accuracy rises to approximately 0.77. However, due to the limited budget in our experiments, only a small number of vertices can undergo repeated labeling. Despite this constraint, the results demonstrate that our proposed policies consistently outperform individual workers by a substantial margin. Moreover, the baseline methods also show improved performance over individual workers, benefiting from the propagation of labeling information through belief propagation (BP). This effectively enhances the labeling process by providing additional context for the labels. This highlights the significant advantages of our approach in leveraging both policy strategies and information propagation to maximize labeling accuracy.

When evaluating performance stability, it's clear that our proposed policies demonstrate greater consistency compared to the baselines. This suggests that the vertices selected by our policies are adept at managing the inherent uncertainty in worker-provided labels. Importantly, our reward function estimation accounts for the potential outcomes of workers delivering labels of $+1$ or $-1$, with the actual reward ultimately contingent on the label received. By factoring in the influence of worker labels on both the vertices and edges of the graph, our policies achieve a more robust reward computation. This comprehensive approach enhances the resilience of the reward mechanism against uncertainties in worker labeling, further solidifying the effectiveness of our strategies in dynamic labeling environments.
