\section{Implementation}

In this section, we discuss three implementation issues. 

The first issue is related to the AMF Learner 
in Definition \ref{def_AMFlearner}. 
Directly solving (\ref{eq:optPAMFL}) 
is not easy since $\Delta_{\alpha,\beta}(h)$ 
is non-convex. We propose to approximate the 
solution by solving  
\begin{equation}
\label{eq:appro_AMFlearner}
\min_{h \in H}\ \frac{1}{n} \sum_{i=1}^n 
\ell(h(x_i), y_i) + \lambda\,  
\tilde{\Delta}_{\alpha,\beta}(h; S),  
\end{equation}
instead, where $\lambda$ is a regularization 
coefficient and 
\begin{equation}
\tilde{\Delta}_{\alpha,\beta}(h; S) 
= \frac{1}{n^2 \beta^2} \sum_{i,j=1}^n M_{ij} 
\cdot |h(x_i) - h(x_j)|^2,      
\end{equation}
with $M$ being an $n$-by-$n$ matrix whose 
entries are defined as 
$M_{ij} = \mathbb{I}\{d(x_i,d_j) \leq \alpha\}$. 
Such approximation can be justified by the 
following relation, which implies that minimizing 
$\tilde{\Delta}_{\alpha,\beta}(h; S)$ also 
minimizes $\Delta_{\alpha,\beta}(h; S)$. 
\begin{lemma}
\label{lem:passiveAMFL}
Fix any $\alpha, \beta > 0$. We have 
$\Delta_{\alpha,\beta}(h; S) \leq 
\tilde{\Delta}_{\alpha,\beta}(h; S)$ for 
any $h \in S$ and sample $S$.
\end{lemma}
In practice, the approximate learner 
(\ref{eq:appro_AMFlearner}) may not 
always return a model with zero bias 
on training data. 
In this case, the proposed algorithm 
remains applicable and sample-efficient
on fairness. There are two possible 
theoretical explanations on the 
maintained efficiency. First, if 
the bias is sufficiently small e.g., 
$\Delta_{\alpha,\beta}(h; S) \in 
O(\varepsilon)$, then the passive 
bound in Theorem \ref{thm:generalization} 
can be extended to $\Delta_{\alpha,\beta}(h)
\in O(\varepsilon)$. Plugging this 
back to Theorem \ref{thm:labelcomplexity}, 
we can obtain a similar complexity 
with an additional constant factor. 
Second, we may borrow ideas from 
agnostic active learning  e.g., \cite{dasgupta2007general,balcan2009agnostic} 
and develop a new complexity for 
the non-realizable  case (i.e., when 
$h$ has zero bias). These possible 
extensions are left for future study. 

The second implementation issue is related 
to the base model. 
We propose to implement a linear model 
and a kernel regression model approximated by 
Random Fourier Feature \cite{rahimi2007random} -- 
we call it `rff model'. 

For the linear model, if instances $x_1, \ldots, 
x_{n} \in \mathbb{R}^p$, we can show 
$\tilde{\Delta}_{\alpha,\beta}(h; S) = 
\frac{2}{n^2 \beta^2} \cdot h^T [x] (D - M) [x]^T h$, 
where $[x]$ is an $n$-by-$p$ matrix with the $i_{th}$ 
row being $x_{i}^T$. Further, if squared loss is used, 
then solution to (\ref{eq:appro_AMFlearner}) is 
\begin{equation}
h = ( [x] (I - \frac{2 \lambda}{n \beta^2} 
(D - M) ) [x]^T)^{-1}([x] [y]),  
\end{equation}
where $[y] \in \mathbb{R}^n$ is a vector with the 
$i_{th}$ entry being $y_i$ and $D$ is 
an $n$-by-$n$ diagonal matrix with 
$D_{ii} = \sum_{j=1}^n M_{ij}$. 

For the rff model, we first calculate random 
features \cite{rahimi2007random} and then 
train a linear model based on them using 
the AMF learner. Note random features are only 
used to approximate the prediction model, 
and we still measure $d(x,x')$ using the 
original features. 

The last issue is related to active learning. 
Given a labeled training set $L$ and an 
unlabeled set $U$, the proposed active AMF 
learner labels a candidate instance $u$ 
if there exists $u' \in L$ satisfying 
$d(u,u') \leq \alpha$ and $|h(u) - h(u')| > \beta$.
In principle, we can also pair $u$ with 
instances in $U$, as long as the labeled 
instances fall in $\mathcal{C}_{\alpha,\beta}(V_q)$. 
In practice, pairing $u$ with instances 
in $L$ is often more efficient (since 
the label set is often way smaller 
than the unlabeled set), and leads to 
slightly better performance as we observe 
in experiments. 


\begin{figure*}[t!]
     \centering
        \begin{subfigure}{.24\textwidth}
         \centering
         \includegraphics[width=\textwidth]{figure/insurance_lambda_delta.png}
         \caption{$\Delta$ versus $\lambda$}
     \end{subfigure} 
     \begin{subfigure}{.24\textwidth}
         \centering
         \includegraphics[width=\textwidth]{figure/insurance_lambda_RMSE.png}
         \caption{RMSE versus $\lambda$}
     \end{subfigure}
     \begin{subfigure}{.24\textwidth}
         \centering
         \includegraphics[width=\textwidth]{figure/delta_tradeoff_linearinsurance.png}
         \caption{$\Delta$ vs $(\alpha,\beta)$ 
         Selection}
     \end{subfigure}
     \begin{subfigure}{.24\textwidth}
         \centering
         \includegraphics[width=\textwidth]{figure/rmse_tradeoff_linearinsurance.png}
         \caption{RMSE vs $(\alpha,\beta)$ 
         Selection}
     \end{subfigure}
     \caption{Results of Sensitivity Analysis}
    \label{fig:sensitivity}
\end{figure*}

