\vspace{-0.1cm}
\section{Related Work}
\label{sec:relatedwork}
% \vskip -0.1cm
\cite{hiranandani2018eliciting} formalized the problem of ME for binary classification with (quasi-)linear metrics and later extended it to the multiclass setting~\citep{hiranandani2019multiclass}. Unlike them, we elicit more complex quadratic metrics, and also provide an information-theoretic lower bound on the query complexity (Theorem~\ref{thm:lb}). 
Prior works on ME %~\cite{hiranandani2019multiclass, hiranandani2020fair} 
offer no such lower bound guarantees. 
Learning linear functions passively using pairwise comparisons is a mature field 
% ~\cite{joachims2002optimizing, herbrich2000large, peyrard2017learning}, 
\citep{joachims2002optimizing, peyrard2017learning}, 
but unlike their active learning counter-parts ~\citep{settles2009active, kane2017active}, these methods are not query-efficient. 
Studies such as~\cite{qian2015learning} provide  active linear elicitation strategies 
but with no guarantees and also work with a different query space. We are unaware of prior work that \emph{provably} elicit a quadratic function, either passively %or more importantly, 
or
actively using pairwise comparisons. Our work is thus a significant first step towards active, nonlinear metric elicitation. 


%Quadratic elicitation increases use-cases of ME. One such use-case is eliciting metrics for fairness. 
The use of metric elicitation for fairness is relatively new, with
% As far as our knowledge is concerned, 
some work on eliciting \textit{individual} fairness metrics~\citep{ilvento2019metric, mukherjee2020two}. \cite{hiranandani2020fair} is the only work we are aware of that elicits \textit{group-fair} metrics, which we extend to handle more general 
% family of 
metrics. 
\cite{zhang2020joint} 
% propose an approach to 
elicit  the trade-off between accuracy and fairness using complex ratio queries. In contrast, we jointly  elicit the predictive performance, fairness violation, and trade-off %together as a non-linear function, all 
using simpler pairwise queries. 
Lastly, there has been
% for constrained classification focus on
work on  learning fair classifiers under constraints 
\citep{zafar2017constraints,agarwal2018reductions}.
We take the regularization view of fairness, where the fairness violation is included in the objective 
\citep{kamishima2012fairness}.
% , corbett2017algorithmic}.
% menon2018cost}.

Our work is also related to decision-theoretic \emph{preference elicitation}, however, with the following key  differences. We focus on estimating the utility function (metric) explicitly, whereas prior work such as~\citep{boutilier2006constraint, benabbou2017incremental} seek to find the optimal decision via minimizing the max-regret over a set of utilities. Studies that directly learn the utility~\citep{perny2016incremental} do not provide query complexity guarantees for pairwise comparisons. Formulations that consider a finite set of alternatives~\citep{boutilier2006constraint} are starkly different from ours, because the set of alternatives in our case (i.e. classifiers or rates) is infinite. 
% , leading to stark differences in formulations and solutions. 
Most of the papers in this literature focus on linear or bilinear~\citep{perny2016incremental} utilities except for~\citep{braziunas2012decision} (GAI utilities) and~\citep{benabbou2017incremental} (Choquet integral); whereas, we focus on quadratic metrics which are useful for classification tasks, especially, fairness. We are not aware of any decision-theory literature that \emph{provably} elicits quadratic (or polynomial) utility functions using pairwise comparisons.

Eliciting performance metrics bears similarities to \emph{learning reward functions} in the inverse reinforcement learning literature~\citep{wu2020efficient,abbeel2004apprenticeship,levine2011nonlinear,sadigh2017active} and the \emph{Bradley-Terry-Luce model with features} in the learning-to-rank literature~\citep{shah2015estimation, niranjan2017inductive}. However, in summary, these studies focus on either eliciting linear utilities or passively learning utility functions. Our work is substantially different from them as we are tied to the geometry of the space of classification error statistics, and elicit quadratic  utility functions using only pairwise comparisons, and particularly, in an active learning fashion. Moreover, we also provide query complexity bounds along with a lower bound. We further elaborate on the specific differences from these papers in  Appendix~\ref{append:sec:relwork}.