\section{Related Work} 
\noindent \textbf{Utility model learning.} Surrogate models build on a rich and growing body of machine learning literature \citep{konyushkova2017learning, coleman2019selection, kossen2022active, ilyas2022datamodels, wang2023one, engstrom2024dsdm}. These works improve data acquisition by: training a regressor to predict expected error rate reduction \citep{konyushkova2017learning,wang2023one}, trimming down model architectures/training epochs as proxy models \citep{coleman2019selection,wang2023one}, approximating the distribution of labels and unobserved features \citep{li2021active}, and harnessing datamodels \citep{ilyas2022datamodels} framework by minimizing trained model loss on target tasks \citep{engstrom2024dsdm}. In contrast, our method use ranking-based neural networks to \textit{learn} a data acquisition function(utility model) \textit{estimating} which subset would yield higher utility value given a pair of equal size of subset training data. Contrary to existing works, we train the utility model by collecting much less samples and sampling from various sizes of subsets rather than fixing subset sizes. 

\noindent \textbf{Planning-based vs learning-based AL strategies.}
Classical AL have predefined %certain forms of 
acquisition strategy including uncertainty sampling \citep{settles2012active, shen2017deep, gal2017deep}, diversity sampling \citep{sener2017active, yehuda2022active} or their combined approaches \citep{xie2022towards, citovsky2021batch, parvaneh2022active}. Meanwhile, there is a long line of work on \textit{learning-based} acquisition function 
%instead of predefined selection criteria 
\citep{learntoactivelearning, learnalgoforactivelearning, wang2023one, sinha2019variational, yan2022budget, yoo2019learning, li2021active, killamsetty2021glister}. For instance, priors works combine meta-learning \citep{killamsetty2021glister} or semi-supervised learning \citep{borsos2021semi} with bi-level optimization in designing acquisition functions.
\citet{yoo2019learning} adopt the idea of ranking the predicted classifier loss in comparing two instances as ``loss prediction module'', querying instances that the classifier is likely to predict wrong, and learn it to predict target losses of unlabeled inputs. We draw inspirations from \citet{killamsetty2021glister, yoo2019learning} by leveraging bi-level training as a subroutine for enhancing generalizability of utility model and querying highest ranked unlabeled instances.

\noindent \textbf{Learning to rank.} 
Ranking techniques have been foundational in fields such as information retrieval \citep{liu2009learning}, recommendation systems \citep{karatzoglou2013learning,li2022learning} and large language models \citep{ouyang2022training}. Motivated by \citet{yoo2019learning, li2021learning}, we shift from the traditional approach of learning cross-entropy loss on unlabeled instances to ranking the utility for paired subsets of data.
While both works \citep{yoo2019learning, li2021learning} view ranking predicted losses as an uncertainty measure, our methodology centers on gauging the utility of labeled data subsets, with the utility being the validation accuracy post-training. 
To the best of our knowledge, our method is the first to incorporate the idea of ranking between pairs of subsets and link it directly to the performance of the learning algorithm on the validation set. We will show the computational advantages and empirical successes of integrating RankNet \citep{burges2005learning} and sidestepping regressing on unlabeled subsets in Section \ref{ablation}. 
