\section{EMPIRICAL EVALUATION} \label{sec:evaluation}

\subsection{Offline Comparisons Datasets}\label{sec:eval_datasets}


\begin{figure*}[htp]
\centering
\subfloat{\includegraphics[scale=0.45]{plots/m_artists_acc.png}}
\subfloat{\includegraphics[scale=0.45]{plots/food_acc.png}}
\subfloat{\includegraphics[scale=0.45]{plots/wit_acc.png}}
\caption{Quality of the embedding produced using different comparison models.
Accuracy is reported on a 10-fold holdout triplet set. Our $\gamma$-CKL models either beat or are on par with competing oracle models.}
\label{fig:wit_emb_plot}
\end{figure*}

In order to validate how well our proposed model fits the real world data, we compare it with the oracle choice models from the literature, namely t-STE, CKL, and Probit in an experiment on three real-world triplet comparisons datasets: Musical Artists \cite{ellis2002quest} containing 9'107 triplets of $n=400$ musicians, Food \cite{wilber2014cost} containing 190'376 triplets of $n=100$ food images, and Movie Actors \cite{chumbalov2020scalable} containing 50'026 triplets of $n=552$ actors. 
For each dataset, we learned an embedding for every model using a different number of dimensions $d$ and performed a 10-fold cross-validation to compute the resulting accuracy on a holdout set. 
The hyperparameters for each model were optimized and the best performing configurations for each $d$ are reported. Overall, across the three datasets $\gamma$-CKL correctly predicts between 84\% and 86\% of triplets, see Fig.~\ref{fig:wit_emb_plot}. For Musical Artists $\gamma$-CKL is on par with t-STE and outperforms Probit and CKL. For Food, $\gamma$-CKL are on par with t-STE and Probit and significantly outperforms CKL.. For Movie Actors dataset, $\gamma$-CKL outperforms its competitors. We can see that the $\gamma$-CKL model immediately benefits from having a general $\gamma$ parameter already in small dimensions compared to the original CKL. We can conclude that the new proposed oracle model very well reflects the real user behaviour on the comparison-like tasks. We also note that increasing $d$ benefits the quality of the learned embedding for $\gamma$-CKL, and as $d$ increases, the best performing values of $\gamma$ tend to also increase, which is aligned with the findings of Theorem~\ref{gamma-d} (see Appendix).



\subsection{Interactive User-Study}
\label{sec:interactive-us}
We are interested in the performance of a scale-free oracle model for the purpose of interactive search.
The current state of the art is \textsc{GaussSearch}, as benchmarked in a user study by \cite{chumbalov2020scalable}.
To compare $\gamma$-CKL to the Probit model underpinning \textsc{GaussSearch}, 
we implement an algorithm \textsc{$\gamma$-CKLSearch} similar in the spirit to \textsc{GaussSearch}, based on the likelihood predicted by $\gamma$-CKL (see Appendix).
We then compare the two algorithms in a user study designed to mimic the setting of \cite{chumbalov2020scalable}.
We not only find that \textsc{GaussSearch} performs slightly better than in the original study (thus validating the state-of-the-art),
but also observe a significantly better search performance with $\gamma$-CKL. 

Our set of items contains $n=513$ pictures of famous movie actors\footnote{A demo version of this experiment is available under \url{https://who-is-th.at}}. 
At each step of a search, the user is presented with four pictures of faces of yet unseen actors and is asked to choose the one that resembles her target the most. 
The search is complete once the user finds her target, i.e., when the picture of the target's face appears in one of the four displayed pictures. 
An embedding of actors' faces has been learned individually for each algorithm, from triplets collected prior to the experiment.

Our study is designed with controlled randomization. Each user sees a target at most once. Each target is searched for twice, once with algorithm \textsc{$\gamma$-CKLSearch} and once with \textsc{GaussSearch}.
This corresponds to an across-subject design and reduces item-related bias. To reduce user-related bias, we also use a within-subject design, where each user performs the same amount of searches with each of the two algorithms. 
The order in which searches are seen is random. Users are not aware of the algorithm they are testing. 
In total, we recruited 24 participants. We collected 207 search trajectories, 104 with \textsc{GaussSearch} and 103 with algorithm \textsc{$\gamma$-CKLSearch}.
Our new method outperforms \textsc{GaussSearch}: with \textsc{$\gamma$-CKLSearch} a user needs on average \textbf{18.83$\pm$1.257} queries to find the target, whereas with \textsc{GaussSearch} he needs on average \textbf{22.08 $\pm$ 1.658} queries.
\textsc{$\gamma$-CKLSearch} algorithm tends to ask queries that are cognitively easier for humans to answer: 
on average participants were spending 11.62 seconds to decide on a query during a search with \textsc{$\gamma$-CKLSearch} versus 13.19 seconds for a query from \textsc{GaussSearch}.




\subsection{Synthetic data} \label{sec:synthetic_exp}

We created an open-sourced version of our algorithm, based on PyTorch \cite{paszke2019pytorch} and provide it in the supplementary material.
A synthetic evaluation of our search algorithm is shown in Figure \ref{fig:synthetic search_v2}. To illustrate the robustness of our algorithm, we show a variety of constellations of $\gamma$ and $\embdim$; for each, we use 50 individual runs to compute confidence intervals. Our implementation is Algorithm \ref{alg:outer_inner_loop} based on the heuristic from Section \ref{sec:implementation}. The volume of a belief area scales with $O(\embdim)$, to be able to compare the convergence rate across different values for $\embdim$, we present the distance to the target, to the power of $\embdim$.  The implementation as well as additional visualizations are included in the supplementary material.

\begin{figure}[t]
\centerline{\includegraphics[scale=0.6]{plots/three_fillbetween.png}}
\caption{Exponential convergence across a range of dimensions and values for $\gamma$.}
\label{fig:synthetic search_v2}
\end{figure}





    



    
    
