\section{Problem Statement}\label{ProblemStatement}
Consider a ground set of data points $\mathcal{X}$ with the groundtruth labeling function $f^*: \mathcal{X} \rightarrow \mathcal{Y}$. The active learning problem in our study unfolds in a two-stage protocol: 
\begin{enumerate}
    \item A \textit{pretraining stage}, where we train an acquisition function from an initial pool of data points;
    \item An \textit{acquisition stage}, where we actively select a set of new examples to label all at once.
\end{enumerate}

We denote the initial pretraining (labeled) set by $\LabeledSet_0$ with $\LabeledSet_0 \subseteq \mathcal{X}$ and $|\LabeledSet_0|=k$, and denote the labeled set after the acquisition stage by $\LabeledSet_1$ with $|\LabeledSet_1|=k+B$ where $B$ represents the labeling budget. The unlabeled sets before and after acquisition are represented as $\Unlabeled_0$ and $\Unlabeled_1$ respectively.
The groundtruth utility function is defined as $u: 2^{\mathcal{X}} \rightarrow \mathbb{R}$, where $u(\mathcal{\utilitysample})$ quantifies the utility of a subset $\mathcal{\utilitysample} \subseteq \mathcal{X}$ by evaluating the validation accuracy of the classifier $f$ induced by the (labeled) data in $\mathcal{\utilitysample}$. Our goal is to find the optimal subset $\LabeledSet_1^*$ such that $f$, when trained on it, achieves the highest validation accuracy, thereby optimizing the utility function $u$:
\begin{align}
\label{mainproblem}
    \LabeledSet_1^* \in \argmax_{\LabeledSet_0 \subseteq \LabeledSet_1 \subseteq \mathcal{X}, |\LabeledSet_1 \setminus \LabeledSet_0| = B} u(\LabeledSet_1)
\end{align}
Here, $u(\LabeledSet_1) = \expctover{x}{\unit(f(x) \neq f^*(x)) \mid \LabeledSet_1}$ for classification tasks, and can be estimated by the error rate of the resulting $f$ on a validation set $\mathcal{S}_{\text{val}} \subseteq \mathcal{X}$. 

Learning $u$ in Equation \ref{mainproblem} is challenging in our setting. Indeed, even approximating $u$ requires the groundtruth utility for a large collection of subsets of the labeled pool, under the practical constraints of a limited labeling budget. We emphasize that the instances are selected \textit{non-adaptively} in the acquisition stage, i.e., our selection of instances does not depend on the labels of previously selected instances. We aim to devise an acquisition strategy for subset selection with maximal downstream classification accuracy.
