\textbf{Calculating Kernelized Mahalanobis Distances using KPCA:}
Proposition \ref{prop:representer theorem} provides that one can learn $\widehat{L}_0$ using the KPCA representations of $\bm{x}_1, \bm{x}_2\ldots \bm{x}_n$. To be precise, given a linear map $L : \mathcal{H}\rightarrow \mathcal{H}$, we may expand the distance $\|L\phi_i-L\phi_j\|^2=\langle L\phi_i,L\phi_i\rangle-2\langle L\phi_i,L\phi_j\rangle+\langle L\phi_j,L\phi_j\rangle$. Let $\Ab$ be as defined in kernelized PCA and $\Phi:=[\phi_1, \phi_2, \ldots \phi_n ]$, the matrix whose columns are $\phi_i$’s. As the $\phi_i$’s are linearly independent, $\Phi$ is full rank\footnote{In the case where the $\phi_i$’s are not linearly independent and $\Phi$ is no longer full rank, KPCA can be modified by projecting onto the $k < n$ eigenvectors corresponding to the nonzero eigenvalues.}. For any $\phi_k$ within the set $\{\phi_1, \phi_2, \ldots \phi_n\}$, we have $L\phi_k=\Ub\Ab^T\Phi^T\phi_k$ for a linear map $\Ub$ from $\mathbb{R}^n$ to $\mathcal{H}$. Additionally, by definition of the kernel function $k(\cdot, \cdot)$, $\Phi^T\phi(\xb_k)=[k(\xb_k,\xb_1),\ldots,k(\xb_k,\xb_n)]^T$. Hence,
\begin{eqnarray*}
    \|L\phi_i-L\phi_j\|_\mathcal{H}^2 &=& \langle\Ub\varphi_i,\Ub\varphi_i\rangle-2\langle\Ub\varphi_i,\Ub\varphi_j\rangle \nonumber
    \\&+&\langle\Ub\varphi_j,\Ub\varphi_j\rangle \nonumber
    \\ &=& \|\Ub\varphi_i-\Ub\varphi_j\|^2 \nonumber
    \\ &=&\|\varphi_i-\varphi_j\|_\Mb^2 \label{M for L}
\end{eqnarray*}
for $\varphi_i \in \mathbf{R}^n$ defined by kernelized PCA on $\phi_1, \phi_2, \ldots \phi_n$, and $\Mb = \Ub^T\Ub \in \mathbb{R}^{n\times n}$. Therefore, we may use kernelized PCA to efficiently compute distances in $\mathbb{R}^n$ as opposed to in $\mathcal{H}$ for a given set $\phi_1, \phi_2, \ldots \phi_n$.