\subsection{Learning Kernelized Metrics in Practice}\label{sec: learning M}
We define following finite dimensional constrained convex program to learn a kernelized Mahalanobis metric from a random set of triplets $\mathcal{S}$:
\begin{equation}
\begin{aligned}
\min_{\Mb\succeq 0} \quad & \widehat{\overline{R}}_\mathcal{S}(\Mb)\\
\textrm{s.t.} \quad  
  &  \|\Mb\|_F \leq \lambda_F    \\
\end{aligned}\tag{P4}\label{opt-P4}
\end{equation}
\iffalse
\begin{eqnarray}
\min_{\Mb \in \mathbb{R}^{n\times n}, \Mb \succeq 0, \|\Mb\|_F \leq \lambda_F} \quad  \widehat{\overline{R}}_\mathcal{S}(\Mb) \label{finite_opt}
\\ & \hspace{-38mm}\text{s.t.} \quad  |\|\varphi_h-\varphi_i\|_\Mb^2-\|\varphi_h-\varphi_j\|_\Mb^2|\leq \gamma, \nonumber
\end{eqnarray}
\fi
where $\Mb \succeq 0$ denotes that $\Mb$ is positive semidefinite and the condition on the norm prevents overfitting as in (\ref{opt-P1}), (\ref{opt-P2}) and (\ref{opt-P3}). Let $\widehat{\Mb}$ denote an optimal solution to (\ref{opt-P4}) referred as the empirical risk minimizer. Likewise, if we instead consider $\|\mathcal{P}^\dagger_{\mathcal{S}_\mathcal{X}}L^\dagger L\mathcal{P}_{\mathcal{S}_\mathcal{X}}\|_{S_2}\leq\lambda_*$, this is corresponding to $\|\Mb\|_*\leq\lambda_*$ where $\|\cdot\|_*$ denotes the nuclear norm. In this setting, we may likewise solve for $\widehat{\Mb}$ satisfying this constraint instead. Below, Proposition \ref{prop:operationalizing} presents the relation between $(\ref{opt-P3})$ and $(\ref{opt-P4})$. Then, we show how to obtain $\widehat{L}_0$ from the finite dimensional solution. 
\begin{prop}\label{prop:operationalizing}
    Optimization problems (\ref{opt-P4}) and (\ref{opt-P3}) are equivalent. Solving $(\ref{opt-P4})$ is equal to learning $\widehat{L}_0$. Likewise, $\widehat{L}_0$ can be considered as the Hilbert space counterpart of finite dimensional space operator $\widehat{\Mb}$. Furthermore, let $\Psi_1,\ldots,\Psi_n \in \mathcal{H}$ be KPCA directions for the span $\mathcal{S}_{\mathcal{X}}$. We can write $\widehat{L}_0$ as
\begin{eqnarray}
\widehat{L}_0:\widehat{L}_0\phi_x =    \sum_{i=1}^n\sum_{j=1}^nw_{i,j}\Psi_i\otimes \Psi_j \mathcal{P}_{\mathcal{S}_{\mathcal{X}}}\phi_x \label{L0 def}
\end{eqnarray}
where $\Psi_i\otimes \Psi_j \phi_x = \langle \Psi_j, \phi_x\rangle_\mathcal{H}\Psi_i$ and $\Wb= \text{Chol}(\widehat{\Mb})$ such that $\Wb\Wb^T=\widehat{\Mb}$, i.e., $\Wb$ is from Cholesky decomposition of $\widehat{\Mb}$.
\end{prop}
Proposition \ref{prop:operationalizing} allows us to operationalize (\ref{opt-P3}) with a finite dimensional convex optimization problem and express $\widehat{L}_0$ from its solution.