\section{Methodology}
\label{sec:methodology} 

This work addresses the budget allocation problem in instance graphs by leveraging correlations between adjacent nodes to optimize data labeling costs. We adopt a Bayesian framework to systematically reduce uncertainty in both instance labeling and correlation estimation (Section \ref{bayesian_setup}). Label propagation is formalized in Sections \ref{factor_graph_estimation} and \ref{labeling_information_propagation}, while the budget allocation problem is reformulated as an entropy optimization framework to minimize uncertainty across both vertices and edges (Section \ref{objective_function}). To solve this problem efficiently, we model it as a Markov Decision Process (MDP), enabling the decomposition of expected uncertainty into stage-wise rewards. A novel reward function is introduced to effectively estimate these rewards (Section \ref{sec:optimal_policy}), and we propose two efficient approximate policies, for instance selection, ensuring optimal label acquisition at each decision step (Section \ref{approximate_policy}).

\subsection{Bayesian Setup}
\label{bayesian_setup}

The input to our method is an unlabeled graph devoid of any information regarding the true labels of its vertices and edges. Following the approach of~\citep{pmlr-v216-kulkarni23a}, we initialize $\theta_{v_i}$ for each vertex $v_i \in V$ using a Beta prior distribution, specifically Beta($\alpha, \beta$). This initialization can be interpreted as assigning $\alpha$ positive and $\beta$ negative pseudo-labels to each vertex $v_i$ at the outset.

For each vertex, we define two key probabilities: the marginal probability and the posterior probability. The marginal probability is derived from the Beta initialization and the labels obtained from workers, while the posterior probability incorporates both the marginal probability and the labeling information propagated from neighboring vertices within the graph. This posterior probability is crucial for estimating $\theta_v$ for all $v \in V$.

As worker labels are obtained for a vertex $v_i$, its marginal probability is updated accordingly. In line with the Bayesian framework, we define the state matrix $\mathbf{S^{t}}$, an $N \times 2$ matrix that represents the marginal probabilities of the vertices at timestamp $t$, where $2$ corresponds to the two possible labels in our binary classification task. At each subsequent timestamp, the policy determines which vertex to select, and the obtained worker label prompts an update to the marginal probability of that vertex, resulting in a transition to a new state, $\mathbf{S^{t+1}}$.

We note that the new state $\mathbf{S^{t+1}}$ is fully determined by the current state $\mathbf{S^{t}}$, the selected vertex ${v}_{t}$ at timestamp $t$, and the worker label $y_{{v}_{t}}$ obtained for that vertex. This relationship establishes $\mathbf{S^{t}}$ as a Markovian process. Moreover, the marginal probability for the current vertex ${v}_{t}$ is calculated as follows:
 \begin{align*}
    P_{v}^{t}(l = +1 | \mathbf{S^t}, {v}_{t}) = \frac{\alpha + a_{v}^{t}}{\alpha + a_{v}^{t} + \beta + b_{v}^{t}}, \numberthis
\end{align*}
where $a_{v}^{t}$ and $b_{v}^{t}$ represent the counts of positive and negative worker labels received for vertex $v$ up to timestamp $t$. Additionally, we have $P_{v}^{t}(l = -1 | \mathbf{S^t}, {v}_{t}) = 1 - P_{v}^{t}(l = +1 | \mathbf{S^t}, {v}_{t})$. As worker labeling information propagates through the graph, the posterior probabilities for all vertices are updated at each timestamp, reflecting the latest insights gained from the labeling process.

The labeling process described above allows us to establish a filtration $\{\mathcal{F}_t\}_{t=0}^{T-1}$, where $\mathcal{F}_t$ is the $\sigma$-algebra generated by the sample path (${v}_{0}, y_{{v}_{0}}, ..., {v}_{t-1}, y_{{v}_{T-1}}$). In this context, $v_t$ represents any vertex selected from $V$ at timestamp $t$, and $y_{{v}_{t}}$ denotes the corresponding worker label obtained. This filtration implies that the choice of vertex at timestamp $t$ can be fully informed by the historical labeling outcomes up to timestamp $t-1$. Consequently, ${v}_{t}$ is $\mathcal{F}_t$-measurable, leading us to define the budget allocation policy as a sequence of vertex selections at each timestamp: $\pi = ({v}_{0}, ...., {v}_{T-1})$.

\subsection{Instance Correlation Estimation}
\label{factor_graph_estimation}
We utilize Belief Propagation (BP)~\citep{pearl2022reverend}, a powerful message-passing algorithm, to disseminate labeling information throughout the graph. To implement BP, we first transform the input graph $G$ into a bipartite factor graph $FG$. This conversion involves adding a factor vertex for each edge $e_k = (v_i, v_j) \in E$, which connects to the vertices $v_i$ and $v_j$ via undirected edges. The resulting factor graph is denoted as $FG = (V \cup F, E')$ , where $|E'| = 2 |E|$. 

Each factor vertex is associated with a function $\mathbf{\phi_{e_k}}$ that specifies the proportion of information to be propagated between vertices $v_i$ and $v_j$, represented as: $\mathbf{\phi_{e_k}} = \begin{bmatrix} \omega_{e_k}(+1) & \omega_{e_k}(-1)\\  \omega_{e_k}(-1) & \omega_{e_k}(+1)\end{bmatrix}$. To streamline our notation, we will use $e_k$ to refer to the factor vertex associated with edge $e_k$ throughout the remainder of this paper. This simplification enhances clarity while maintaining precision in our discussions. Since the values of $\omega_{e_k}$ are initially unknown, we propose to estimate them using the marginal probabilities of the connected vertices. For the edge $e_k = (v_i, v_j) \in E$, we compute the marginal probability of $e_k$ at timestamp $t$ as follows:
\begin{align*}
    P^{t}_{e_k}(+1) = P^{t}_{v_i}(+1) \times P^{t}_{v_j}(+1) + P^{t}_{v_i}(-1) \times P^{t}_{v_j}(-1), \numberthis
    \label{eq4}
\end{align*}
where $P^{t}_{v_i}$ and $P^{t}_{v_j}$ represent the marginal probabilities of vertices $v_i$ and $v_j$ at timestamp $t$, respectively.

An edge is considered labeled if both of its end vertices have received at least one worker label. However, in the early stages of the labeling process, most vertices remain unlabeled, necessitating a method to estimate the marginal probabilities of these unlabeled edges. To achieve this, we employ a Random Forest Regression (RFR) model~\citep{breiman2001random}. 

While alternative models, such as neural networks, could also be considered for estimating edge potential, we have found that Random Forest Regression is particularly well-suited to our requirements. It trains quickly, delivers robust performance even with a limited number of labeled instances, and eliminates the need for extensive calibration. Given these advantages, RFR strikes an optimal balance between efficiency and predictive accuracy for our specific application.

Our underlying intuition is that edges connecting nodes with similar attribute vectors are likely to exhibit similar marginal probabilities. To train the model, we concatenate the features of the end vertices $v_i$ and $v_j$ for labeled edges, using this combined feature set as input while treating the marginal probabilities computed according to Eq. (\ref{eq4}) as the target for regression. Once trained, the model is then deployed to estimate the marginal probabilities for the unlabeled edges in the graph, enhancing our labeling process significantly.

The calculated marginal probabilities serve as estimates for $\omega_{e_k}$, allowing us to update $\mathbf{\phi_{e_k}}$ as follows: $\mathbf{\phi_{e_k}} = \begin{bmatrix} P^{t}_{e_k}(+1) & P^{t}_{e_k}(-1)\\  P^{t}_{e_k}(-1) & P^{t}_{e_k}(+1)\end{bmatrix}$. With this update in place, we utilize the constructed factor graph $FG$ to effectively propagate labeling information throughout the entire graph, ensuring that insights gained from labeled edges are shared with their neighbors.

\subsection{Labeling Information Propagation}
\label{labeling_information_propagation} 

In the factor graph $FG$, labeling information is effectively propagated through the exchange of messages between variable vertices and factor vertices. The computation of the message from a variable vertex to a factor vertex is carried out as follows:
\begin{align}
\underset{v \rightarrow f}{\mu}(x_v) = \prod_{f^* \in \mathcal{N}(v) \setminus \{f\}} \underset{f^* \rightarrow v}{\mu}(x_v), \numberthis
\label{eq5}
\end{align}
and the message from the factor vertex to the variable vertex is computed as follows:
\begin{equation}
\mu_{f \rightarrow v}(x_v) = \sum_{\substack{x_f' = x_v \\ x_{v^*}' = x_v}} 
\phi_f(x_f') \prod_{v^* \in \mathcal{N}(f) \setminus \{v\}} \mu_{v^* \rightarrow f}(x_{v^*}').
\label{eq6}
\end{equation}
Here, $x_v \in \{+1, -1\}$ denotes the labeling space for the variable vertex $v \in V$. Furthermore, $\mathcal{N}(v)$ and $\mathcal{N}(f)$ indicate the sets of neighboring vertices for the variable vertex $v$ and factor vertex $f$, respectively.

At each timestamp, messages are transmitted from the leaf vertices in the graph to a chosen vertex (\textit{forward propagation}), and then from this chosen vertex back to the leaf vertices (\textit{backward propagation}). Each message from a vertex $v \in V$ is initially set to its marginal probability. Following the updates defined in Eq. (\ref{eq5}) and Eq. (\ref{eq6}), the messages for all internal vertices are also refined. This belief propagation process is iterated multiple times until convergence is achieved, with messages being normalized at each step to prevent underflow. Ultimately, the posterior probability for each variable vertex $v \in V$ at timestamp $t$ is calculated as follows:
\begin{align}
P_{v}^{t}(+1) \propto \frac{\alpha + a_{v}^{t}}{\alpha + a_{v}^{t} + \beta + b_{v}^{t}} \prod_{j \in \mathcal{N}(v)} \underset{j \rightarrow v}{\mu}(+1). \numberthis
\label{eq7}
\end{align}
We can observe that the updates to the posterior probabilities of the vertices are entirely governed by the chosen vertex and the corresponding worker label received.

\subsection{Objective Function}
\label{objective_function}

Our objective is to minimize the uncertainty associated with instance labeling and instance correlation estimation by the end of the budget at timestamp $T$. The entropy of the posterior probabilities for vertices and the marginal probabilities for edges serves as a measure of labeling uncertainty. Consequently, we formulate our objective function to minimize the expected entropy of the labeling for both vertices and edges in the graph, conditioned on $\mathcal{F}_t$:
\begin{align}
\mathcal{H}_{T} & = \underset{}{\mathrm{argmin}} \  \mathbb{E} \left( H^{T}(V) + H^{T}(E) \right), \numberthis
\label{eq8}
\end{align}
Here, $H^{T}(V) = - \sum_{v} \sum_{x} P^{T}_{v}(x) \log P^{T}_{v}(x)$ and $H^{T}(E) = - \sum_{e} \sum_{y} P^{T}_{e}(y) \log P^{T}_{e}(y)$ are the entropy of vertices and edges in the graph. $P^{T}_{v}$, $P^{T}_{e}$ are the posterior probability of vertex $v \in V$ and marginal probability of edge $e \in E$ at the end of budget $T$, respectively. This formulation effectively captures the uncertainty associated with labeling for both vertices and edges in the graph.

The objective is to identify a policy that minimizes the value function for the objective defined in Eq. (\ref{eq8}) by the end of the budget $T$. Any policy $\pi$ that successfully minimizes Eq. (\ref{eq9}) is considered the optimal policy, denoted as $\pi^{*}$.
\begin{equation}
V(S^T) \dot= \arg\min_{\pi} \ \mathbb{E}^\pi \left[ \mathbb{E} \left( H^T(V) + H^T(E) \right) \right].
\label{eq9}
\end{equation}
where $V(S^{T})$ denotes the value function at the conclusion of budget $T$, while $\pi$ represents the policy responsible for selecting instances to obtain worker labels at each timestamp. Additionally, $\mathbb{E}^{\pi}$ signifies the expectation calculated over the sample paths (${v}_{0}, y_{{v}_{0}}, ..., {v}_{t-1}, y_{{v}_{T-1}}$) generated by the policy $\pi$.

\subsection{Reward Function}
\label{sec:optimal_policy}

We approach the task of identifying the optimal policy $\pi^{*}$ for the value function defined in Eq. (\ref{eq9}) by framing it as a Markov Decision Process (MDP). The final expected uncertainty is influenced by the selection of instances at each timestamp. To address this, we decompose the final expected uncertainty into a sum of \textit{stage-wise rewards}, utilizing the methodology outlined in~\citep{xie2012sequential}. While~\citep{xie2012sequential} primarily addresses an \textit{infinite-horizon} problem that focuses on optimizing stopping times, \citep{chen2013optimistic} has demonstrated that this technique is also applicable to \textit{finite-horizon} scenarios.

Given that the value function accounts for the total entropy of the vertices and edges within the graph, we define the reward function as the change in this total entropy between two timestamps. A higher reward signifies a greater reduction in uncertainty regarding the labeling of the graph's vertices and edges.

\begin{prop}
The stage-wise expected reward between two timestamps $t$ and $t+1$ is defined as:
\begin{align}
    R(S^{t}, v_{t}) = &\mathbb{E} (( H^{t}(V) +H^{t}(E)) \nonumber \\
    &- (H^{t+1}(V) + H^{t+1}(E)) | \mathbf{S^{t}}, v_{t}), 
    \label{eq10}
\end{align}
then the value function in Eq. (\ref{eq9}) becomes:
\begin{align}
    V(S^T) =  V(S^{0}) - \underset{\pi}{\mathrm{sup}} \ \mathbb{E}^{\pi} \left( \sum_{t=0}^{T-1} R(\mathbf{S^{t}}, v_{t}) \right). \numberthis
    \label{eq11}
\end{align}
Any policy $\pi$ that attains the supremum for Eq. (\ref{eq11}) is the optimal policy $\pi^{*}$. Here,  $V(S^{0}) = H^{0}(V) +H^{0}(E)$. We provide the derivation of Proposition \ref{prop3} in Appendix \ref{proof_of_proposition}.
\label{prop3}
\end{prop}

Proposition \ref{prop3} is instrumental in formulating the minimization problem in Eq. (\ref{eq9}) as a $T$-stage Markov Decision Process (MDP) and transforming it into a maximization problem aimed at maximizing the expected reward, as demonstrated in Eq. (\ref{eq11}). Since the marginal probability of the edges is derived from the worker labels obtained for the vertices, the $T$-stage MDP is contingent solely on the state of the vertices at each timestamp. Thus, the $T$-stage MDP is represented by the tuple $\{T, \{\mathcal{S}^{t}\}, \mathcal{A}, \mathcal{P}^{t}, R(\mathbf{S^{t}}, v_{t})\}$. In this tuple, $T$ signifies the budget, which corresponds to the number of worker labels we can acquire; $\mathcal{S}^{t}$, the state space at stage $t$, encompasses all possible states reachable at that stage; $\mathcal{A} = \{1, 2, ..., N\}$ denotes the action space, representing the set of instances eligible for labeling next; $\mathcal{P}^{t} = \{P_{1}^{t}, P_{2}^{t}, ..., P_{N}^{t}\}$ comprises the posterior probabilities at timestamp $t$ for each vertex $v_i \in V$; and $R(\mathbf{S^{t}}, v_{t})$ is the expected reward defined in Eq. (\ref{eq10}). Once a label $y_{v_{t}}$ is obtained for vertex $v$ at timestamp $t$, the marginal probability of vertex $v_i \in V$ will be updated accordingly. Therefore, we have
\small
\begin{align}
    \mathcal{S}^{t} = \left\{ \{p_{1_v}^{t}, p_{2_v}^{t}\}_{v=1}^{N}: p_{1_v}^{t}, p_{2_v}^{t} \in [0, 1],  p_{1_v}^{t} + p_{2_v}^{t} = 1 \right\}. \numberthis
    \label{eq12}
\end{align}
\normalsize
The posterior probabilities of multiple vertices can change as a result of the labeling information propagated from the chosen vertex and the obtained worker label. Additionally, the marginal probabilities of edges may also be affected. Importantly, all these updates are entirely dictated by the selected vertex and the corresponding worker label. Consequently, leveraging the Markovian property of $\{\mathbf{S^{t}}\}$,  it is adequate to consider a Markovian policy \citep{powell2007approximate}, where the choice of $v_t$ is made solely based on the current state $S^{t}$.

\subsection{Efficient Approximate Policy}
\label{approximate_policy}

Finding the optimal policy for the value function in Eq. (\ref{eq9}) is non-trivial. Therefore, we propose efficient approximate policies designed to select instances that maximize the reward for obtaining worker labels at each timestamp. These approximate policies aim to achieve the supremum of the value function defined in Eq. (\ref{eq11}) within the framework of a $T$-stage Markov Decision Process (MDP). At any state $\mathbf{S^{t}}$ at timestamp $t$, when a vertex $v \in V$  is chosen for a worker label, the worker can provide either a label of $+1$ or $-1$. Consequently, the policies must account for both outcomes when calculating the expected reward. Let $R_{1}(\mathbf{S^{t}}, v_{t})$, $R_{2}(\mathbf{S^{t}}, v_{t})$ represent the rewards for obtaining labels $+1$ and $-1$, respectively. The expected reward can then be expressed as:
\begin{align}
    R(\mathbf{S^{t}}, v_{t}) = p_1 R_{1}(\mathbf{S^{t}}, v_{t}) + p_2 R_{2}(\mathbf{S^{t}}, v_{t}), \numberthis
    \label{eq13}
\end{align}
 where $p_1 = \frac{\alpha + a_{v}^{t}}{\alpha + a_{v}^{t} + \beta + b_{v}^{t}}$ and $p_2 = \frac{\beta + b_{v}^{t}}{\alpha + a_{v}^{t} + \beta + b_{v}^{t}}$ are marginal probabilities of $v$ at timestamp $t$. The optimistic reward can be expressed as:
 \begin{align}
    R^{+}(\mathbf{S^{t}}, v_{t}) = \max(R_{1}(\mathbf{S^{t}}, v_{t}), R_{2}(\mathbf{S^{t}}, v_{t})). \numberthis
    \label{eq14}
\end{align}
The first proposed approximate policy, OPTUENT-EXP, selects the instance that offers the highest expected reward at each timestamp, denoted as $\hat{\pi} = (v_0,..., v_{T-1})$. This strategic choice maximizes the potential benefit of obtaining worker labels, ensuring optimal use of resources throughout the process.
\begin{equation}
v_t = \arg\max_{v} \left( R(\mathbf{S^{t}}, v) \dot= p_1 R_1(\mathbf{S^{t}}, v) + p_2 R_2(\mathbf{S^{t}}, v) \right).
\label{eq15}
\end{equation}
The second proposed approximate policy, OPTUENT-OPT, selects the instance with the highest optimistic reward at each timestamp, represented as $\pi^{o} = (v_0,..., v_{T-1})$. This approach strategically prioritizes instances that promise the greatest potential benefits, thereby optimizing the acquisition of worker labels and enhancing the overall effectiveness of the labeling process.
\begin{align}
    v_t = \underset{v}{\mathrm{argmax}} \left( R^{+}(\mathbf{S^{t}}, v)\ \dot=\ \max(R_{1}(\mathbf{S^{t}}, v), R_{2}(\mathbf{S^{t}}, v))\right). \numberthis
    \label{eq16}
\end{align}
By utilizing Eq. (\ref{eq15}) or Eq. (\ref{eq16}), we can effectively determine the optimal vertex to target for obtaining the worker label at each timestamp $0 \leq t < T$. This strategic selection process ensures that we maximize the value of our labeling efforts at every stage. The complete procedure is outlined in Algorithm~\ref{alg:optuent}.

\begin{algorithm}[t]
\caption{Uncertainty-Guided Budget Allocation for Graph Labeling}
\label{alg:optuent}
\KwIn{
    Graph $G=(V,E)$; Budget $T$; Beta prior parameters $\alpha$, $\beta$; \\
    Policy $\pi \in \{\text{OPTUENT-EXP}, \text{OPTUENT-OPT}\}$
}
\KwOut{
    Posterior label estimates $\{\theta_{v_i}\}_{v_i \in V}$; Edge correlation estimates $\{\omega_{e_k}\}_{e_k \in E}$
}

Initialize marginal probabilities $\theta_{v_i} \sim \text{Beta}(\alpha, \beta)$ for all ${v_i} \in V$\;
Initialize labeled set $\mathcal{L} \leftarrow \emptyset$\;

\For{$t = 1$ \KwTo $T$}{
    
    \ForEach{$v_i \in V$}{
        Compute marginal edge probabilities using vertex marginals via Eq.~\ref{eq4}\;
        Estimate edge correlations $\{\omega_{e_k}\}$ for unlabeled edges using Random Forest Regression trained on labeled edge features and Eq.~\ref{eq4}\;
        Compute posterior probabilities using current marginals and Belief Propagation (Eq.~\ref{eq7})\;
        Compute rewards $R_1(S^t, v_i), R_2(S^t, v_i)$ for label outcomes $+1$ and $-1$\;
        
        \eIf{$\pi = \text{OPTUENT-EXP}$}{
            Compute expected reward $R(S^t, v_i)$ using Eq.~\ref{eq13}\;
        }{
            Compute optimistic reward $R^+(S^t, v_i)$ using Eq.~\ref{eq14}\;
        }
    }
    
    Select $v_t = \arg\max_{v_i \in V} R(S^t, v_i)$ or $R^+(S^t, v_i)$ depending on $\pi$\;
    
    Query label $y_{v_t} \sim \text{Bernoulli}(\theta_{v_t})$\;
    
    Update counts $(a_{v}^t, b_{v}^t)$ and re-run Belief Propagation to update posterior\;
    
    $\mathcal{L} \leftarrow \mathcal{L} \cup \{y_{v_t}\}$\;
}
\Return Posterior label estimates $\{\theta_{v_i}\}_{v_i \in V}$ and edge correlation estimates $\{\omega_{e_k}\}_{e_k \in E}$
\end{algorithm}

\subsection{Proposed Policies are Consistent}

To demonstrate the consistency of the proposed policies, we must show that as the budget $T$ approaches infinity, the sum of entropy for the vertices and edges in the graph converges to a constant value. This constant is defined by the true label of each instance $\theta_{v_i}$ for $v_i \in V$ and the instance correlation $\omega_{e_k}$ for every edge $e_k \in E$. Thus, as $T$ goes to infinity, each vertex should receive an infinite number of labels, ensuring that the estimated $\theta_{v_i}$ aligns with the true label, and the estimated $\omega_{e_k}$ converges to its true value.

To establish consistency, we first demonstrate in Appendix \ref{sec:rfr} that the random forest regressor achieves over 95\% accuracy with a small budget of 40 on the large Cora and Pubmed datasets, indicating rapid convergence. This observation leads us to conclude that as $T$ goes to infinity, changes in edge uncertainty become negligible, allowing us to focus solely on the entropy of vertex labeling. We show that the posterior probability for each vertex $v_i \in V$ is updated based on its marginal probability and that of the leaf vertices in the factor graph $FG$. The proposed reward function in Eq. \ref{eq10} is proportional to the change in the marginal probability of the chosen vertex $v_t$, ensuring that both OPTUENT-EXP and OPTUENT-OPT label each vertex infinitely many times as the budget increases. Given that we assume all workers are equally reliable, this leads to convergence on $\theta_{v_i}$ for each $v_i \in V$ and $\omega_{e_k}$ for every edge $e_k \in E$. Consequently, the sum of entropy for the vertices and edges converges to a constant value, demonstrating that the proposed policies $\hat{\pi}$ and $\pi^{o}$ are consistent. A detailed proof of this consistency is provided in Appendix \ref{consistency_proof}.

