\section{INTRODUCTION}
Understanding how human perceive objects is essential in many areas from machine learning \citep{hu2015deep, hsieh2017collaborative} to psychology \citep{cao2013similarity, roads2019obtaining} and policy learning \citep{liu2021deep}. Learning representations over objects that reflects similarities and dissimilarities on human perception is key to this understanding. Metric learning is the study of learning such a distance function that represents similarities and dissimilarities among objects. This is particularly useful in computer vision applications such as image retrieval \citep{hoi2010semi, yao2020adaptive} and face recognition \citep{guillaumin2009you, cao2013similarity}, and recommendation systems \citep{zhang2019next, wu2020effective}, where the notion of similarity plays a central role on the performance. Comparative judgments over objects has been widely used as a powerful tool in those applications and many others to understand similarities and dissimilarities. In this paper, we provide a theoretical foundation to the task of metric learning from triplet comparisons in the form of \textit{“is item h more similar to item i or to item j?”} (see Figure \ref{fig:triplets} for an example triplet comparison query for Food-100 dataset \cite{wilber2014cost}). We aim to learn a metric that predicts triplet comparisons as well as possible by learning a distance function. Let $\bx\in\mathbb{R}^d$ be the representation of objects. We are given a random set of triplet comparisons in the form of 
\begin{eqnarray*}
    \text{sign}(\text{dist}^2(\bm{x}_h,\bm{x}_i)-\text{dist}^2(\bm{x}_h,\bm{x}_j)),
\end{eqnarray*}
which compare relative distances between a head item $\bx_h$ to two alternates $\bx_i, \bx_j$. As an example, items may be images of products sold in an online marketplace and the features $\bx_i$ could either be constructed from metadata about each product or extracted automatically from the image via a neural network. As human judgments are complex and involve higher order interactions of features, we seek a sufficiently expressive family of distance metrics to model these judgments. Hence we consider learning a nonlinear metric represented with a kernelized setting. 

In the special case of a linear kernel, it corresponds to learning the Mahalanobis metric represented by a positive semidefinite matrix $\Mb$.
\iffalse
a triplet query for given objects $(\bm{x}_h, \bm{x}_i, \bm{x}_j)$ can be written as
\begin{eqnarray*}
 y_{t_{\{h,i,j\}}}= \text{sign}\left( \|\bm{x}_h-\bm{x}_i\|^2_{\textbf{M}}-\|\bm{x}_h-\bm{x}_j\|^2_{\textbf{M}} \right)
\end{eqnarray*}
where $y_{t_{\{h,i,j\}}}$ denotes the query result  for triplet $t_{\{h,i,j\}}$. 

In the linear case, learning the Mahalanobis metric corresponds to learning a symmetric positive semidefinite matrix $\Mb$, where 
\begin{eqnarray*}
    \text{dist}^2(\bm{x}_i, \bm{x}_j)=\|\bm{x}_i-\bm{x}_j\|^2_\Mb={(\bm{x}_i-\bm{x}_j)^T\Mb(\bm{x}_i-\bm{x}_j)}.
\end{eqnarray*}
\fi
As $\Mb$ is positive semidefinite, we can write $\Mb=\Lb^T\Lb$ using the Cholesky decomposition. Thus, learning the positive semidefinite matrix $\Mb$ can be also cast as learning the linear transformation $\Lb$ such that the distances are interpreted as Euclidean distances between points transformed by the matrix $\Lb$. Our work extends this to the kernelized scenario. We focus on learning a linear metric on a reproducing kernel Hilbert space (RKHS) in this work. 

\begin{figure*}[t]
    \centering
    \includegraphics[width=1\linewidth]{figures/image6.jpg}
    \caption{Metric Learning from triplet comparisons (example triplets from Food-100 dataset \citep{wilber2014cost}). $\mathcal{S}$ is the set of triplets and $y_t$ is the label collected from human for each triplet $t$.}
    \label{fig:triplets}
\end{figure*}

We assume that we have access to a feature map $\phi$ that maps from $\mathbb{R}^d$ to a real reproducing kernel Hilbert space (RKHS) $\mathcal{H}$ such that $\langle \phi(\bx_i), \phi(\bx_j)\rangle=k(\bx_i, \bx_j)$ and $\|\phi(\bx)\|_
\mathcal{H}=\sqrt{k(\bx, \bx)}$ for a known kernel function $k: \mathbb{R}^d\times \mathbb{R}^d \rightarrow \mathbb{R}^1$. Therefore, $k(\cdot, \cdot)$ satisfies the reproducing property that $\langle f, k(\cdot, \bx)\rangle=f(\xb)$ for any $f\in \mathcal{H}$ and $\bx \in \mathbb{R}^d$. Then for any bounded linear operator $L: \mathcal{H}\rightarrow \mathcal{H}$, we define an associated nonlinear Mahalanobis metric, $d_L$, as 
\begin{eqnarray*}
    d_L^2 (\xb_i, \xb_j) =\|L\phi(\xb_i)-L\phi(\xb_j)\|_\mathcal{H}^2 \\
    &\hspace{-45mm}={\langle L\phi(\xb_i)-L\phi(\xb_j), L\phi(\xb_i)-L\phi(\xb_j) \rangle_\mathcal{H}}.
\end{eqnarray*}
For simplicity, we use $\phi_i$ for $\phi(\bx_i)$ for the rest of the paper. With the kernelized metric setting, we can write triplet queries as
\begin{eqnarray*}
\text{sign}\left( \|L\phi_h-L\phi_i\|_\mathcal{H}^2 -\|L\phi_h-L\phi_j\|_\mathcal{H}^2 \right).
\end{eqnarray*}
This paper advances the understanding of the empirically powerful tasks of nonlinear metric learning via two core theoretical contributions:
\begin{itemize}
    \item We establish the first generalization error and sample complexity guarantees for kernelized metric learning from triplet comparisons. 
    \item We provide insights into how regularization affects the sample complexity and generalization bounds for kernelized metric learning from triplet comparisons. 
\end{itemize}
As a byproduct, our analysis extends the results of the linear metric learning setting of \cite{mason2017learning}, overcoming its limited applicability, which required the number of items $n$ to be larger than the dimensionality $d$. 





