% \vspace{-0.45cm}
\section{Introduction}
\label{sec:intro}
% \vspace{-0.1cm}
% \vskip -0.2cm
\emph{Given a classification task, which performance metric should the classifier optimize?} This question is often faced by practitioners while developing machine learning solutions. For example, consider cancer diagnosis where a doctor applies a cost-sensitive predictive model to classify patients into cancer categories~\citep{yang2014multiclass}. The costs may be based on known consequences of misdiagnosis, i.e, side-effects of treating a healthy patient vs. mortality rate for not treating a sick patient. Although it is clear that the chosen costs directly determine the model decisions and dictate patient outcomes, it is not clear how to quantify the expert's intuition into precise quantitative cost trade-offs, i.e., the performance metric. 

Indeed, the above is also true for a variety of other domains including \emph{fair machine learning} where picking the right metric %to measure 
is a critical challenge~\citep{Dmitriev2016MeasuringM, zhang2020joint}. The issue is exacerbated when the practitioner’s notion of fairness does not exactly match with any standard fairness criterion. For example, a practitioner may be interested in weighting each group discrepancy differently, but may not be able to provide us with the exact weights or a precise mathematical expression that reflects on the practitioner’s innate fairness notion.

\begin{figure}[t]
    \centering
    \vspace{-5pt}
    \includegraphics[scale = 0.29]{plots/Presentation5.jpeg}
    \vspace{-0.1cm}
    \caption{Metric Elicitation~\citep{hiranandani2018eliciting}.}
    \label{fig:meframework}
    \vskip -0.6cm
\end{figure}


% To address this issue, 
\cite{hiranandani2018eliciting, hiranandani2019multiclass, hiranandani2020fair} addressed this issue by formalizing the framework of \emph{Metric Elicitation (ME)}, whose goal is to estimate a performance metric using preference feedback from a user. The motivation is that by employing metrics that reflect a user's innate trade-offs given the task, context, and population at hand, one can learn models that best capture the user preferences~\citep{hiranandani2018eliciting}. As humans are often inaccurate in providing absolute quality feedback~\citep{qian2013active}, \cite{hiranandani2018eliciting} propose to use pairwise comparison queries, where the user (oracle) is asked to compare two classifiers and provide a relative preference. Using such pairwise comparison queries, ME aims to recover the oracle's metric. Figure~\ref{fig:meframework} (reproduced from \cite{hiranandani2018eliciting})  depicts the ME framework.
% {\color{red}\citet{hiranandani2020fair} further extend this setup to handle group fairness metrics.}

A notable drawback of existing ME strategies is that they only handle linear or quasi-linear function of predictive rates, which can be restrictive for many applications where the metrics are non-linear. For example, in \emph{fair machine learning}, classifiers are often judged by measuring discrepancies between predictive rates for different protected groups~\citep{hardt2016equality}. Similarly, discrepancies among different distributions are measured in \emph{distribution matching} applications~\citep{narasimhan2018learning, Fab1}. A common measure of discrepancy in such applications is the squared difference, which is a quadratic metric that cannot be handled by existing approaches. Quadratic metrics also find use in class-imbalanced learning~\citep{goh2016satisfying, narasimhan2018learning} (see Section~\ref{ssec:metric} for examples). Motivated by these examples, in this paper, we propose strategies for eliciting metrics defined by \emph{quadratic} functions of rates, that encompass linear metrics as special cases. Our approach also generalizes to eliciting polynomial metrics, a universal family of functions~\citep{stone1948generalized},  
allowing one to better capture real-world human preferences. 

% For approximating the quadratic metric, 
Our high-level idea is to
approximate the quadratic metric
with multiple linear functions, employ linear ME to estimate the individual 
 local slopes, 
 and combine the slope estimates to reconstruct the original metric. 
 While natural and elegant, this approach comes with non-trivial challenges. 
 Firstly, we must choose 
 center 
 points for the local-linear approximations,
 and the chosen points must represent
 feasible queries. Secondly, because of the use of pairwise queries, we only receive \emph{slopes} (directions) and not magnitudes for the local-linear functions, requiring intricate analyses to reconstruct the original %quadratic 
 metric and to deal with multiplicative errors that result.
 Despite the challenges,
 our method requires a query complexity that is only \emph{linear} in the number of unknowns, which we show is {\em near-optimal}. To our knowledge, we are the first to prove such a lower bound for metric elicitation.

We further elaborate on eliciting group-fair metrics. The prior work by \citet{hiranandani2020fair} consider a restricted class of fairness metrics, where the fairness discrepancies are defined to be the \emph{absolute} differences between group-specific rates. Moreover, their approach does not  generalize %easily 
to other families of metrics. In contrast, we are able to handle a more general family of non-linear fairness metrics defined by quadratic functions of group rate differences and 
% using our quadratic elicitation approach and 
show how our proposed quadratic ME approach is easily adaptable to elicit such group-fair quadratic metrics. 

In summary, we make the following contributions :
\bitemize[leftmargin=8pt, itemsep=0pt]
\item We propose a novel quadratic ME algorithm for classification problems, which requires only pairwise preference feedback 
either over classifiers or predictive rates. 

\item Specific to group-based fairness tasks, we show how to jointly elicit the predictive performance and fairness metrics, and the trade-off between them.
\item We show that the proposed approach is robust under feedback and %classifier estimation 
finite sample noise and requires a  near-optimal number of queries. %for elicitation 
\item We empirically validate the proposal 
% and show its robustness to 
for multiple classes and groups on simulated oracles.
\item 
% we discuss how our strategy can be 
We discuss how our strategy can be generalized to elicit higher-order polynomials by recursively applying the procedure to elicit lower-order approximations. 
\eitemize

\textbf{Paper Organization:} For ease of exposition, we first discuss quadratic metric elicitation in the usual multiclass classification setup without fairness. Section~\ref{sec:background} contains the problem setup and the associated background, and Section~\ref{sec:quadme} describes the proposed quadratic ME procedure. 
We then cover ME under the multiclass-multigroup framework in Section ~\ref{sec:fairme}, where we additionally have protected group information embedded in the problem setup. 
In Section ~\ref{sec:guarantees}, we provide guarantees for our proposed procedures, and in Section~\ref{sec:experiments}, we present our experiments. We discuss related work in Section~\ref{sec:relatedwork} and provide concluding remarks in Section~\ref{sec:discussion}.

% \vspace{-0.1cm}
\textbf{Notations.} 
For $k \in \Zmbb_+$, we denote $[k] = \{1, \cdots , k\}$ and use $\Delta_k$ to denote the $(k-1)$-dimensional simplex. % $\norm{\cdot}_2$  and $\norm{\cdot}_\infty$ denote the $\ell_2$-norm and $\ell_\infty$-norm, respectively. 
 We denote  inner products %of vectors 
 by $\inner{\cdot}{\cdot}$ and  Hadamard products by $\odot$. 
% For a matrix $\Ambf$, $\offdiag(\Ambf)$ returns a vector of off-diagonal elements of $\Ambf$. 
$\|\cdot\|_F$ represents the Frobenius norm, %of a matrix 
% by $\|\cdot\|_F$.
and $\alphambf_i \in \Rmbb^q$ denotes the $i$-th standard basis vector, where the $i$-th coordinate is 1 and  others are 0. % in $q$-dimensional space. 