\section{Background}
\label{sec:background}

We consider a $k$-class classification setting with $X \in \Xcal$ and $Y \in [k]$ denoting the input and output random variables, respectively. We assume access to an $n$-sized sample $\{(\xmbf, y)_i\}_{i=1}^n$ generated \emph{iid} from a distribution $ \Pmbb(X, Y)$. 
We work with randomized classifiers $h : \Xcal \rightarrow \Delta_k$ that for any $\xmbf$ gives a distribution $h(\xmbf)$ over the $k$ classes and use 
 $\Hcal = \{h : \Xcal \rightarrow \Delta_k\}$
 to denote the set of all classifiers. 

\emph{Predictive rates:} 
We denote the predictive rates for a classifier $h$ by the vector $\rmbf(h, \Pmbb) \in \Rmbb^{k}$, where the $i$-th coordinate is the fraction of label-$i$ examples for which the randomized classifier $h$ also predicts class $i$:
\begin{align}
	r_{i}(h, \Pmbb) \coloneqq \Pmbb(h(X) = i | Y = i)  \quad \text{for} \; i \in [k].
	\label{eq:components}
\end{align}
The probability above is over draw of $(X, Y) \sim \P$ and the randomness in $h$. The proposed setup and solution (discussed later) easily extends to general predictive rates of the form $\Pmbb(h(X) = j | Y = i)$ for $i, j \in [k]$. For simplicity, we defer this extension to Appendix  \ref{append:generalquad}.  

\emph{Metrics:} We consider metrics  that are defined by a general function $\phi : [0, 1]^{k}  \rightarrow \Rmbb$ of rates: 
% \vspace{-0.2cm}
% \begin{align*}
  $$\phi(\rmbf(h, \Pmbb)).$$
% \end{align*}
% \vskip -0.2cm
This includes the (weighted) accuracy
$\phi^{\text{acc}}(\rmbf(h, \Pmbb))$ $\,=\, \sum_{i} a_i r_i(h, \Pmbb)$, for weights $a_i \in \mathbb{R}_{+}$, the G-mean, and many more metrics~\citep{sokolova2009systematic}. 
Unless specified, we treat metrics as utilities, i.e., larger values are better. 
Since the metric's scale does not affect the learning problem~\citep{narasimhan2015consistent}, we allow $\phi : [0, 1]^{k}  \rightarrow [0,1]$.

\emph{Feasible rates:} We will restrict our attention to only those rates that are feasible, i.e., can be achieved by some classifier. The set of all feasible rates is given by: 
% \vspace{-0.25cm}
$$\Rcal = \{\rmbf(h, \Pmbb) \,:\, h \in \Hcal \}.$$ 
% \vskip -0.2cm
To avoid clutter in notations, we will suppress the dependence on $\Pmbb$ and $h$ if it is clear from the context.
% \vspace{-0.1cm}
\subsection{Metric Elicitation: Problem Setup}
\label{ssec:me}
% \vskip -0.1cm
% We now describe the problem of \emph{Metric Elicitation}. % comprising rates. 
% Our definitions follow from~\citet{hiranandani2019multiclass}.
We now describe the problem of \emph{Metric Elicitation}, % comprising rates. 
which follows from~\cite{hiranandani2019multiclass}. There's an \textit{unknown} metric $\phi$, and we seek to elicit its form by posing queries to an \emph{oracle} asking  which of two classifiers is more preferred by it. The oracle has access to the  metric $\phi$ and responds by comparing its value on the two classifiers.
% and are provided here for completeness.
\bdefinition
[Oracle Query] Given two classifiers $h_1, h_2$ (equiv. to rates $\rmbf_1, \rmbf_2$ respectively), a query to the Oracle (with metric $\phi$) is represented by:
% \vspace{-0.1cm}
\begin{align}
\Gamma(h_1, h_2\,;\, \phi) = \Omega(\rmbf_1, \rmbf_2\,;\,\phi) &= \1[\phi(\rmbf_1) > \phi(\rmbf_2)], 
% =: \1[\rmbf_1 \succ \rmbf_2],
\end{align}
% \vskip -0.15cm
\noindent where $\Gamma: \Hcal \times \Hcal \rightarrow \{0,1\}$ and $\Omega: \Rcal \times \Rcal \rightarrow \{0, 1\}$. The query asks whether $h_1$ is preferred to $h_2$ (equiv. if $\rmbf_1$ is preferred to $\rmbf_2$), as measured by $\phi$. 
\label{def:query}
\edefinition
\vskip -0.1cm

In practice, the oracle can be an expert, a group of experts, or an entire user population. The ME framework can be applied by posing classifier comparisons directly via interpretable learning techniques~\citep{ ribeiro2016should} 
% ~\cite{ribeiro2016should, doshi2017towards} 
or via A/B testing~\citep{tamburrelli2014towards}. For example, in an internet-based application 
% one may perform A/B testing by deploying two classifiers A and B with two different sub-populations of users and use their level of engagement to decide the preference over the two classifiers. 
one may perform the A/B test by deploying two classifiers A and B with two different sub-populations of users and use their level of engagement to decide the preference over the two classifiers. 
% which of the two classifiers is preferred.  
For other applications, one may present %to the user, 
visualizations of rates of the two classifiers 
% (e.g.,  \cite{zhang2020joint,beauxis2014visualization}), 
% (e.g.,  \cite{ beauxis2014visualization,shen2020designing}), 
(e.g.,  \citep{shen2020designing}), 
and have the user provide the  preference (see Appendix~\ref{append:userstudy} for an example). 
% See e.g.\  Figure 3 in~\cite{zhang2020joint} or Figure 2 in~\cite{beauxis2014visualization} for intuitive ways to visualize rates. 
%Next, we state the ME problem. 
Moreover, since the metrics we consider are functions of only the predictive rates, queries comparing classifiers are the same as queries on the associated rates. So for convenience, we will have our algorithms pose queries comparing two (feasible) rates.
%, but they can be equivalently seen as comparing two classifiers. 
Indeed  given a feasible rate, one can efficiently find the associated classifier (see Appendix \ref{append:ssec:sphere} for details). % \cite{narasimhan2018learning}. 
We next formally state the ME problem.
\bdefinition [Metric Elicitation with Pairwise Queries (given $\{(\xmbf,y)_i\}_{i=1}^n$)~\citep{hiranandani2018eliciting, hiranandani2019multiclass}] Suppose that the oracle's (unknown) performance metric is $\phi$.  Using oracle queries of the form $\Omega(\rmbfhat_1, \rmbfhat_2\,;\,\phi)$, where $\rmbfhat_1, \rmbfhat_2$ are the estimated rates from samples, recover a metric $\hphi$ such that $\Vert\phi - \hphi\Vert < \kappa$ under a  suitable norm $\Vert \cdot \Vert$ for sufficiently small error tolerance $\kappa > 0$.
\label{def:me}
\edefinition
\vskip -0.2cm

The performance of ME is evaluated both by the query complexity and the quality of the elicited metric~\citep{hiranandani2018eliciting, hiranandani2019multiclass}. As is standard in the decision theory literature~\citep{koyejo2015consistent},
% ~\cite{koyejo2015consistent, hiranandani2018eliciting, hiranandani2019multiclass}, 
we present our ME approach by first assuming access to population quantities such as the population rates $\rmbf(h, \Pmbb)$, then examine estimation error from finite samples, i.e., with empirical rates $\rmbfhat(h, \{(\xmbf,y)_i\}_{i=1}^n)$. 

% \vspace{-0.2cm}
\subsection{Linear Metric Elicitation}% -- Warmup}
\label{ssec:mpme}

As a warm up, we overview the Linear Performance Metric Elicitation (LPME) procedure of \citep{hiranandani2019multiclass}, which we will use as a subroutine.  
% while eliciting quadratic metrics. 
Here we assume that the oracle's metric is a linear function of rates $\phi^{\text{lin}}(\rmbf) \coloneqq \inner{\ambf}{\rmbf}$, for some unknown weights $\ambf \in \Rmbb^k$. 
In other words, given two rates $\rmbf_1$ and $\rmbf_2$, the oracle returns $\1[\inner{\ambf}{\rmbf_1} > \inner{\ambf}{\rmbf_2}]$. Since the metrics are scale invariant~\citep{narasimhan2015consistent}, without loss of generality (w.l.o.g.), one may assume $\Vert \ambf \Vert_2=1$. The goal is to elicit (the slope of) $\ambf$ using pairwise comparisons over rates.
%Analogously, the oracle queries are $\tiny{\Omega\left( \rmbf_1, \rmbf_2 \,;\, \xi \right) \coloneqq \1[\xi(\rmbf_1) > \xi(\rmbf_2)]}$. 

When the number of classes $k = 2$, the coefficients $\ambf$ can be elicited using a one-dimensional binary search. When $k > 2$, one can apply a  coordinate-wise procedure, performing a binary search in one coordinate, while keeping the others fixed. The efficacy of this procedure, however, hinges on the geometry of the %underlying set of feasible rates $\Rcal$, 
set of rates $\Rcal$. Before discussing the geometry, we make a mild assumption 
% on the distribution 
that ensures some signal for non-trivial classification.
\bassumption%[\cite{hiranandani2019multiclass}]
\label{assump:distribution}
The conditional-class distributions are distinct, i.e., $\Pmbb(Y=i|X) \ne \Pmbb(Y=j|X)$
$\forall \; i \ne j$.%, P(Y=i|X) \ne P(Y=j|X)$.
\label{as:sphere}
\eassumption
% \vskip -0.1cm
% that the class-conditional distributions are not identical, i.e., there is some signal for non-trivial classification~\cite{hiranandani2019multiclass}.

Let $\embf_i \in \{0,1\}^k$ denote the rates achieved by a trivial classifier that predicts class $i$ for all inputs. 
% inputs. %, i.e., classifiers predicting only class $i$ on the entire space $\Xcal$. 
% \vspace{-0.1cm}

\begin{figure}[t]
% \hspace{-0.25cm}
\begin{tikzpicture}[scale = 1.0]
    \begin{scope}[scale = 0.6]\scriptsize
    
    \def\r{0.12};
    
    \coordinate (a) at (-0.4,1);
    \coordinate (b) at (0.6, 3.25);
    \coordinate (c) at (7, 4);
    \coordinate (d) at (6.5, 2);
    \coordinate (e) at (2.5, -0.75);
    \coordinate (f) at (-0.1, -0.5);
    
    \coordinate (labelleft) at (3, -1.25);
    
    \coordinate (Cent) at (3,1.75);
    \coordinate (Centcent) at (2.85,1.90);
    \coordinate (Cent1) at (4.45,1.75);
    \coordinate (Cent2) at (3,3.2);
    \coordinate (CentL) at (1.55,1.75);
    
    \coordinate (Space1) at (0.2,0.2);
    \coordinate (SpaceR) at (0.25,3.25);
    \coordinate (Spacem) at (-0.1,2.5);
    
    \coordinate (Sphere) at (5,3.25);
    \coordinate (Sphere0) at (3,1);
    \coordinate (Sphere1) at (4.45,1);
    \coordinate (Sphere2) at (3,2.45);
    \coordinate (SphereL) at (1.65,1);
    \coordinate (Sphereminus) at (3.4,0.95);
    
    \coordinate (r) at (3.75,2);
    
    \coordinate (u11) at (-0.25, -0.25);
    \coordinate (uextra121) at (1, 3.8);
    \coordinate (u21) at (4, 4.5);
    \coordinate (uextra2k1) at (6, 1);
    \coordinate (uk1) at (3.5, -0.5);
    
    \coordinate (u12) at (0.3, -0.70);
    \coordinate (uextra122) at (0.65, 2.65);
    \coordinate (u22) at (3.4, 4.2);
    \coordinate (uextra2k2) at (5.15, 1.6);
    \coordinate (uk2) at (3.75, 0.1);
    
    \coordinate (u13) at (-0.30, 0.10);
    \coordinate (uextra123) at (1.25, 3.25);
    \coordinate (u23) at (4, 4.5);
    \coordinate (uextra2k3) at (5.75, 1);
    \coordinate (uk3) at (3.75, -0.5);
    
    \coordinate (u14) at (-0.30, 0.10);
    \coordinate (uextra124) at (1.25, 3.25);
    \coordinate (u24) at (4, 4.5);
    \coordinate (uextra2k4) at (5.75, 1);
    \coordinate (uk4) at (3.75, -0.5);
    
    
    \fill[color=black] 
            (Cent) circle (0.08)
            (u11) circle (0.08)
            (u21) circle (0.08)
            (uk1) circle (0.08);
    
    \draw[thick] (u11) .. controls (a) and (b) .. (uextra121) 
    -- (u21) .. controls (c) and (d) .. (uextra2k1) -- (uk1) .. controls  (e) and (f) .. (u11);
    
    % \draw[dashed, brown, thick] (u11) .. controls (-2.25, 1.25) and (0.9, 3) .. (u21) .. controls (8.5,4.75) and (7,2) .. (6.5, 1.25) .. controls (6.5, 1) and (5.25, 0.3) .. (uk1) -- (u11);
    
    % \draw[dashed, red, thick] (u11) .. controls (-1.5, 1.5) and (-0.5, 3.5) .. (1.5, 4.5) 
    % -- (u21) .. controls (6.5, 3.5) and (6, 2) .. (6, 1.75) -- (uk1) .. controls  (3, -1) and (-0.25, -0.75) .. (u11);
    
    \draw[thick] (Cent) circle (2cm);
    
    
    \draw[thick, dotted] (Cent) circle (0.5cm);
    \draw[thick, dotted] (Cent1) circle (0.5cm);
    \draw[thick, dotted] (Cent2) circle (0.5cm);
    \draw[thick, dotted] (CentL) circle (0.5cm);
    % \node at (Space1) {{$\Rcal^1$}};
    \node at (SpaceR) {{$\Rcal$}};
    % \node at (Spacem) {{$\Rcal^m$}};
    
    \node at (Sphere) {\large{${\Scal}$}};
    \node at (Sphere0) {\tiny{$\Scal_{\ombf}$}};
    \node at (Sphere1) {\tiny{$\Scal_{\zmbf_1}$}};
    \node at (Sphere2) {\tiny{$\Scal_{\zmbf_2}$}};
    \node at (SphereL) {\tiny{$\Scal_{-\zmbf_1}$}};
    
    \node[below right] at (Centcent) {$\ombf$};
    
    \node[below] at (u11) {{$\embf_1$}};
    \node[above] at (u21) {{$\embf_2$}};
    \node[below right] at (uk1) {{$\embf_k$}};
    
    \node at (labelleft) {{\normalsize{(a)}}};
    
     \end{scope}
     
     \begin{scope}[shift={(4.3,0)},scale = 0.5]\scriptsize
    
    \def\r{0.12};
    
    \coordinate (a) at (-0.2,1);
    \coordinate (b) at (0.8, 2.75);
    \coordinate (c) at (7, 4);
    \coordinate (d) at (6.5, 2);
    \coordinate (e) at (2.5, -0.75);
    \coordinate (f) at (-0.1, -0.5);
    
    \coordinate (labelright) at (3, -1.5);
    
    \coordinate (Cent) at (3,1.75);
    \coordinate (Centcent) at (2.85,1.90);
    \coordinate (CentR) at (3.6,2.35);
    \coordinate (CentL) at (2.4,1.15);
    
    \coordinate (Space1) at (0.2,0.2);
    \coordinate (Space2) at (-0.3,1.3);
    \coordinate (Spacem) at (-0.1,2.5);
    
    \coordinate (Sphere) at (4.5,3);
    \coordinate (Sphereplus) at (2.6,2.55);
    \coordinate (Sphereminus) at (3.4,0.95);
    
    \coordinate (r) at (3.75,2);
    
    \coordinate (u11) at (-0.25, -0.25);
    \coordinate (uextra121) at (1.25, 3.75);
    \coordinate (u21) at (4, 4.5);
    \coordinate (uextra2k1) at (6, 1);
    \coordinate (uk1) at (3.5, -0.5);
    
    \coordinate (u12) at (0.3, -0.70);
    \coordinate (uextra122) at (0.65, 2.65);
    \coordinate (u22) at (3.4, 4.2);
    \coordinate (uextra2k2) at (5.15, 1.6);
    \coordinate (uk2) at (3.75, 0.1);
    
    \coordinate (u13) at (-0.30, 0.10);
    \coordinate (uextra123) at (1.25, 3.25);
    \coordinate (u23) at (4, 4.5);
    \coordinate (uextra2k3) at (5.75, 1);
    \coordinate (uk3) at (3.75, -0.5);
    
    \coordinate (u14) at (-0.30, 0.10);
    \coordinate (uextra124) at (1.25, 3.25);
    \coordinate (u24) at (4, 4.5);
    \coordinate (uextra2k4) at (5.75, 1);
    \coordinate (uk4) at (3.75, -0.5);
    
    
    \fill[color=black] 
            (Cent) circle (0.08)
            (u11) circle (0.08)
            (u21) circle (0.08)
            (uk1) circle (0.08);
    
    \draw[dashed, blue, thick] (u11) .. controls (a) and (b) .. (uextra121) 
    -- (u21) .. controls (c) and (d) .. (uextra2k1) -- (uk1) .. controls  (e) and (f) .. (u11);
    
    \draw[dashed, brown, thick] (u11) .. controls (-2.25, 1.25) and (0.9, 3) .. (u21) .. controls (8.5,4.75) and (7,2) .. (6.5, 1.25) .. controls (6.5, 1) and (5.25, 0.3) .. (uk1) -- (u11);
    
    \draw[dashed, red, thick] (u11) .. controls (-1.5, 1.5) and (-0.5, 3.5) .. (1.5, 4.5) 
    -- (u21) .. controls (6.5, 3.5) and (6, 2) .. (6, 1.75) -- (uk1) .. controls  (3, -1) and (-0.25, -0.75) .. (u11);
    
    \draw[thick] (Cent) circle (1.5cm);
    
    % \draw[thick, dotted] (CentR) circle (0.58cm);
    \node at (Space1) {{$\Rcal^1$}};
    \node at (Space2) {{$\Rcal^2$}};
    \node at (Spacem) {{$\Rcal^m$}};
    
    \node at (Sphere) {\large{$\overline{\Scal}$}};
    % \node at (Sphereplus) {\tiny{$\Scal^+_{\varrho}$}};
    
    \node[below right] at (Centcent) {$\ombf$};
    
    \node[below left] at (u11) {{$\embf_1$}};
    \node[above] at (u21) {{$\embf_2$}};
    \node[below right] at (uk1) {{$\embf_k$}};
    
    \node at (labelright) {{\normalsize{(b)}}};
    
     \end{scope}
     
    \end{tikzpicture}
    \vskip -0.2cm
      
     \caption{(a) Geometry of the set of predictive rates $\Rcal$: A convex set enclosing a sphere ${\Scal}$ with trivial rates $\embf_i \, \forall \, i \in [k]$ as vertices; (b) Geometry of the product set of group rates $\Rcal^1 \times \dots \times \Rcal^m$ (best seen in color)~\citep{hiranandani2020fair}; $\Rcal^u \, \forall \, u \in [m]$ are convex sets with common vertices $\embf_i \, \forall \, i \in [k]$ and enclose a sphere $\overline{\Scal} \subset \Rcal^1 \cap \dots \cap \Rcal^m$.}
        % \vskip -0.4cm
      \label{fig:geometry}
\end{figure}

\bproposition
[Geometry of $\Rcal$; Figure~\ref{fig:geometry}(a)] The set of  rates $\Rcal \subseteq [0, 1]^{k}$ is convex, has vertices $\{\embf_i\}_{i=1}^k$, and  
contains the rate profile $\ombf = \tfrac{1}{k} \tiny{\sum_{i=1}^k \embf_i}$ in the interior. Moreover, $\ombf$ is achieved by the uniform random classifier which for any input predicts each class with equal probability.
\label{prop:C}
\eproposition
% \vskip -0.2cm
\bremark[Existence of sphere ${\Scal}$]
Since $\Rcal$ is convex and contains
the point $\ombf$ in the interior, there exists a %$q$-dimensional 
sphere ${\Scal} \subset \Rcal$ of non-zero radius $\rho$ centered at $\ombf$.  
\label{rem:sphere}
\eremark
% \vskip -0.2cm

By restricting the coordinate-wise binary search procedure to posing queries from within a sphere, LPME can be seen as minimizing a strongly-convex function and shown to converge to a solution $\ambfhat$ close to $\ambf$. 
Specifically, the LPME procedure 
takes any %query space
sphere $\Scal \subset \Rcal$, binary-search tolerance $\epsilon$, and the oracle $\Omega$ (with metric $\phi^{\text{lin}}$) %(\cdot, \cdot \,;\, \xi)$ 
 as input, and by posing %$O(q\log(\pi/2\epsilon))$ 
$O(k\log(1/\epsilon))$
queries recovers coefficients $\ambfhat$ with %$\Vert \ambfhat \Vert_2=1$  such that 
$\Vert \ambf - \ambfhat \Vert_2 \leq O(\sqrt{k}\epsilon)$. %(Theorem~2 in~\cite{hiranandani2019multiclass}). 
% The details can be found in Algorithm~2 in~\cite{hiranandani2019multiclass} and are also provided in Appendix~\ref{append:sec:slme} for completeness. 
The details of the algorithm are provided in Appendix~\ref{append:sec:slme} for completeness, but the following remark is the most important for our subsequent discussion.
% and summarize the discussion with the following remark.

\bremark[LPME Guarantee]
Given any $k$-dimensional %space $\Rcal$ enclosing a 
sphere $\Scal \subset \Rcal$ and an oracle $\Omega$ with metric $\phi^{\textrm{\textup{lin}}}(\rmbf)\coloneqq\inner{\ambf}{\rmbf}$, %, and an oracle  $\Omega$ for the metric, %(\cdot, \cdot; \phi)$,
%$\Omega(\cdot, \cdot; \xi)$, 
the LPME algorithm (Algorithm~\ref{alg:slme}, Appendix~\ref{append:sec:slme}) provides an estimate $\ambfhat$ with $\Vert \ambfhat \Vert_2=1$ such that the estimated slope is close to the true slope, i.e.,  $\sfrac{{a}_i}{{a}_j} \approx \sfrac{\hat a_i}{\hat a_j} \; \forall \; i, j\in [k]$.
\label{rm:ratio}
\eremark
% \vskip -0.2cm

Note that the LPME procedure is closely tied to the scale invariance condition and thus only estimates the slope (direction) of the coefficient vector $\ambf$,
% $\nabla \phi(\rmbf)$,
%$\nabla \xi$, 
%i.e., the slope 
and not its magnitude. 
Despite this drawback, we will discuss how we can elicit quadratic metrics using LPME in Section~\ref{sec:quadme}. 
Also note the algorithm takes as input an \emph{arbitrary} sphere $\Scal \subset \Rcal$, and restricts its queries to rate vectors within the sphere. 
 In Appendix~\ref{append:ssec:sphere}, we discuss an efficient procedure~\citep{hiranandani2019multiclass} for identifying a sphere %$\Scal$ 
 of suitable radius.

% \vspace{-0.2cm}
\subsection{Quadratic Performance Metrics}
\label{ssec:metric}
% \vskip -0.2cm
Equipped with the LPME subroutine, our aim is to elicit metrics that are quadratic functions of rates.

\bdefinition[Quadratic Metric] For a vector $ \ambf \in \Rmbb^k$  %, $\ambf \geq 0$ 
and a negative semi-definite 
matrix $\Bmbf \in NSD_k$ with $\Vert \ambf \Vert_2^2  + \Vert \Bmbf \Vert_F^2 = 1$ (w.l.o.g.\ due to scale invariance):
% we define:
\vspace{-0.1cm}
\begin{equation}
    \phi^\quadr(\rmbf \,;\, \ambf, \Bmbf) = \inner{\ambf}{\rmbf} + \frac{1}{2} \rmbf^T \Bmbf \rmbf.
    \label{eq:quadmet}
\end{equation}
\vspace{-0.7cm}
\label{def:quadmet}
\edefinition
This family trivially includes the linear metrics
% discussed in the previous section~\cite{sokolova2009systematic, hiranandani2018eliciting, hiranandani2019multiclass} 
as well as many modern metrics outlined below: 

\bexample[Class-imbalanced learning]
\emph{In problems with imbalanced class proportions, it is common to use metrics that emphasize equal performance across all classes. One example is Q-mean 
% \cite{Lawrence+98,LiuCh11,menon2013statistical},
\citep{menon2013statistical},
which is the quadratic mean of rates:
{\small
$\phi^{\qmean}(\rmbf) = 1 -  1/k\sum_{i=1}^k \left(1 - r_i \right)^2.$}
}
\eexample

\bexample[Distribution matching]
\emph{
In certain binary classification applications, one needs the proportion of predictions 
%made by a classifier 
for each class (i.e., the coverage) to match a target distribution $\boldsymbol{\pi} \in \Delta_2$ 
% \cite{goh2016satisfying,narasimhan2018learning, narasimhan2019optimizing,Cotter:2019}. 
\citep{goh2016satisfying,narasimhan2018learning}. 
A %evaluation 
measure often used for this task is the squared difference between the per-class coverage and the target distribution: 
{\small$\phi^{\cov}(\rmbf) \,=\, 1 - \frac{1}{2}\sum_{i=1}^2 \left(\cov_i(\rmbf) - \pi_i\right)^2$}, where 
{\small$\cov_i(\rmbf) = r_i + 1 - r_{\neq i}$}. 
Similar metrics can be found in the quantification literature where the target is set to the class prior $\Pmbb(Y=i)$ \citep{Fab1, 
%Fab2, 
Kar16}. %,  in combination with an additional error term. 
We capture more general quadratic distance measures for distributions, e.g.\ {\small$(\bf{\cov}(\rmbf) - \boldsymbol{\pi})^{T}\Qmbf (\bf{\cov}(\rmbf)-\boldsymbol{\pi})$} for $\Qmbf \in NSD_2$ \citep{Lindsay08}. 
% The example extends to multiclass case, where one uses general rates $\Pmbb(h(X) = j | Y = i),\, i\neq j$ (see Appendix~\ref{append:generalquad}).
}
\label{ex:distmatchbin}
\eexample
\vspace{-0.1cm}


Lastly, we need the following assumption on the metric.

\bassumption
\label{assump:smoothness}
The gradient of  $\phi$ at the trivial rate $\ombf$ is non-zero, i.e., $\nabla \phi^{\quadr}(\rmbf)|_{\rmbf=\ombf} = \ambf + \Bmbf\ombf \neq 0.$
\label{as:smooth}
\eassumption
\vspace{-0.1cm}

The non-zero gradient assumption is reasonable for a concave $\phi^{\text{quad}}$, where it merely implies that the optimal classifier for the metric is not the uniform random classifier. 