\section{PRACTICAL IMPLEMENTATION} \label{sec:practical}
In Section \ref{KernelizedML}, we show that solving (\ref{opt-P2}) with a
search restricted to $\mathcal{S}_\mathcal{X}$, i.e., solving for $\widehat{L}_0$ in (\ref{opt-P3}), presents a solution for both (\ref{opt-P2}) and (\ref{opt-P3}). We bound the generalization error based on $\widehat{L}_0$ (see Theorems \ref{thm:generalization_error_withbounded_Fro_norm} and \ref{thm:generalization_error_withbounded_Nuclear_norm}). Our goal in this part is to solve (\ref{opt-P3}) to learn $\widehat{L}_0$, which is a nonlinear Mahalanobis metric. Note that in addition to being possibly infinite dimensional, the optimization (\ref{opt-P3}) is also nonconvex.  

%that predicts triplets as well as possible by minimizing empirical risk given in (\ref{empirical risk first}) for a Lipschitz loss $\ell$ with a set of constraints on the metric. 

%Given  with associated labels $y_t$, our goal is to learn a nonlinear Mahalanobis metric that predicts triplets as well as possible by minimizing empirical risk given in (\ref{empirical risk first}) for a Lipschitz loss $\ell$ with a set of constraints on the metric, i.e., solving (\ref{opt-P3}). 

In this section, we carefully demonstrate how to learn $\widehat{L}_0$ from a random set of independent triplets $\mathcal{S}$ with associated labels $y_t$ via convex optimization. We show that solving ($\ref{opt-P3}$) is equivalent to solving a finite dimensional convex optimization problem. We use a representer theorem (see Proposition \ref{prop:representer theorem}) to reduce finding $\widehat{L}_0$ to an optimization over finite dimensional vectors. We use the idea of Kernelized Principle Component Analysis (KPCA) to compute all distances using KPCA vectors $\varphi_1, \varphi_2, \ldots, \varphi_n\in \mathbb{R}^{n}$ and reduce the problem to learning an $n-$dimensional metric parameterized by a semidefinite matrix denoted $\Mb$:
 \begin{equation}
\begin{aligned}
\widehat{\overline{R}}_\mathcal{S}(\Mb) := \frac{1}{|\mathcal{S}|}\sum_{(t,y_t)\in \mathcal{S}}l(y_t(\|\varphi_h-\varphi_i\|^2_\Mb-\|\varphi_h-\varphi_j\|_\Mb^2)) 
\end{aligned}
\end{equation}
where $n=3|\mathcal{S}|$ and $\varphi_i\in \mathbb{R}^{n}$ denotes the KPCA representation of feature $\phi_i\in\mathcal{H}$ for the random set $\phi(\xb_1), \phi(\xb_2), \ldots \phi(\xb_{n})$. We refer to the quantity $\widehat{\overline{R}}_\mathcal{S}(\Mb)$ as the (finite dimensional) empirical risk of $\Mb$. We can express $\widehat{L}_0$ using the solution of (finite dimensional) empirical risk minimization with corresponding constraints. In Section \ref{sec:KPCA} we use known results to explain how to perform KPCA, how to calculate distances with finite dimensional vectors in KPCA and how to relate norm constraints over $L$ with finite dimensional metric $\Mb$. Then, in Section \ref{sec: learning M}, we provide the finite dimensional optimization with all constraints that is equivalent to (\ref{opt-P3}) and express $\widehat{L}_0$ from its solution.  
