\section{DISCUSSION} \label{appendix:discussion}
Our results extend the linear metric setting of \cite{mason2017learning} in two key ways: First, our main results provide generalization error and sample complexity bounds for the kernelized metric learning from triplet comparisons. Second, the linear metric learning analysis of \cite{mason2017learning} requires that the number of items, $n$, be larger than the dimensionality, $d$, which limits its applicability. In contrast, our analysis, which also considers linear kernels, offers a more general framework, even for linear metric learning from triplet comparisons.

\cite{mason2017learning} consider a fixed set of items in $\mathbb{R}^d$
and derive generalization bounds based on selecting triplets uniformly from those that can be generated from the fixed item set. Their analysis exploits the fact that the item set is fixed and requires that the number of items n is larger than the dimensionality d, which limits its applicability. Also note that, the true risk of \cite{mason2017learning} is defined with respect to a discrete uniform distribution over $n{\binom{n-1}{2}}$ triplets possible from the fixed set of $n$ items.

Our setting differs significantly from \cite{mason2017learning} in the following aspects: We do not assume a fixed set of items, which would otherwise restrict generalization only to the triplets drawn from this fixed set. Instead, in our setting, each triplet query involves items drawn iid from an unknown distribution $\mathcal{D}$. Our true risk is defined over this unknown distribution and the generalization bounds hold for triplets chosen from this distribution. Thus, our analysis also extends the generalization results even for the linear kernel case in high dimensions (large $d$) apart from generalizing to infinite-dimensional RKHS.

Given the difference in settings, the proof technique we use differs from \cite{mason2017learning}). To derive our sample complexity results, we turn our attention to the metric and exploit the fact that the true metric ${L}^*$ has a bounded Schatten$-p$ norm, which constrains how $L$ interacts with any random data. We use this constraint in conjunction with the Riesz Representation Theorem to further refine our analysis.
