\subsection{Kernelized Principle Component Analysis (KPCA)} \label{sec:KPCA}
In this part, we explain how to perform kernelized PCA in a reproducing kernel Hilbert space (RKHS). Consider the set of items $\bm{x}_1, \bm{x}_2, \bm{x}_3 \ldots \bm{x}_{n} \in \mathbb{R}^d$ and corresponding features $\phi_1, \phi_2, \ldots \phi_{n}$. We assume that $\phi_i$'s are linearly independent. Recall that $\mathcal{S}_\mathcal{X}\subset\mathcal{H}$ represents the subspace spanned by $\{\phi(\xb_1), \phi(\xb_2), \ldots \phi(\xb_{n})\}$. Let $\psi_1, \psi_2, \ldots \psi_n$ be the n principal component directions in this space. We show how to efficiently compute projections onto this subspace using the idea of Kernelized Principle Component Analysis (KPCA). This is important as the principal components live in the possibly infinite dimensional space $\mathcal{H}$ making traditional optimization either intractable or impossible. The following procedure, which we summarize for completeness from \cite{chatpatanasiri2010new} can be used to compute the projection of any point $\bm{x} \in \mathbb{R}^d$ onto the principal component directions in time that is polynomial in $n=3|\mathcal{S}|$:
\begin{enumerate}
    \item Form the Gram matrix: $\Kb \in \mathbb{R}^{n\times n}$ such that $\Kb_{i,j}=k(\bm{x}_i,\bm{x}_j)$.
    \item Center the Gram matrix: $\overline{\Kb}=\Kb-\frac{1}{n}\mathbf{1}_{n\times n}\Kb-\frac{1}{n}\Kb\mathbf{1}_{n\times n}+\frac{1}{n^2}\mathbf{1}_{n\times n}\Kb\mathbf{1}_{n\times n}$, where $\mathbf{1}_{n\times n}$ is the n by n matrix of all ones.
    \item Compute all n eigenvectors of $\overline{\Kb}, \alpha_1, \ldots, \alpha_{n}$ and form matrix $\Ab = [\alpha_1, \ldots, \alpha_{n}]$. 
    \item For any $\xb \in \mathbb{R}^d$ and any principal component $\psi_j$ with eigenvector $\alpha_j$, we have that $\langle \phi(\xb), \psi_j \rangle_\mathcal{H}=\sum_{i=1}^{n}\alpha_{i,j}k(\xb, \xb_i)$.
    \item Therefore, for any $\xb \in \mathbb{R}^d$ we may represent $\phi(\xb)$ in terms of its projection onto $\psi_1, \ldots, \psi_{n}$ as
    \begin{equation*}
        \varphi(\xb)=\Ab^T[k(\xb, \xb_1), \ldots, k(\xb, \xb_{n})]^T
    \end{equation*}
\end{enumerate}
For the remainder, we will let $\varphi_i\in \mathbb{R}^{n}$ denote the KPCA representation of random feature $\phi_i\in\mathcal{H}$ for the set $\phi(\xb_1), \phi(\xb_2), \ldots \phi(\xb_{n})$. The following representer theorem demonstrates that we may instead use finite dimensional vectors $\varphi_1, \ldots, \varphi_{n}$ for the optimization without loss in performance for a given set $\phi(\xb_1), \phi(\xb_2), \ldots \phi(\xb_{n})$.
\begin{prop}{(Theorem 1 of \cite{chatpatanasiri2010new})}\label{prop:representer theorem}
    Let $\{\overline{\psi}_i\}_{i=1}^n$ be any set of points in $\mathcal{H}$ such that Span$\left(\{\overline{\psi}_i\}_{i=1}^n\right)=\mathcal{S}_\mathcal{X}$ and let $\mathcal{H}'$ be a Hilbert space such that $\mathcal{H}$ and $\mathcal{H}'$ are separable. For any objective function f, the optimization
    \begin{equation*}
        \min_L f\left(\{\langle L\phi_i, L\phi_j\rangle_{\mathcal{H}'}\}_{i,j\in [n]}\right)
    \end{equation*}
    such that $L : \mathcal{H} \rightarrow \mathcal{H}'$ is a bounded linear map, has the same optimal value as
    \begin{equation*}
        \min_{{L}'\in\mathbb{R}^{n\times n}} f\left(\{\overline{\psi}(\xb_i)^T{{L}'}^T{L}'\overline{\psi}(\xb_j)\}_{i,j\in [n]}\right)
    \end{equation*}
    where $\overline{\psi}(\xb)=[\langle\phi(\xb), \overline{\psi}_1 \rangle, \ldots, \langle\phi(\xb), \overline{\psi}_n \rangle]^T\in \mathbb{R}^n$.
\end{prop}
