\section{MODEL}
\label{sec:model}






For a query $Q = (\vx_i, \vx_j)$ and the corresponding oracle answer $Y$, call $p_{\vx_i, \vx_j, \vx_t} = P(Y=\vx_i \mid Q=(\vx_i,\vx_j), \vx_t)$ the probability of the outcome $Y = \vx_i$, i.e., "$i$ is closer than $j$ to $t$". 
In this paper, we discuss the advantages of a scale-free oracle model, for which
$p_{c \cdot \vx_i, c \cdot \vx_j, c \cdot \vx_t} = p_{\vx_i, \vx_j, \vx_t}$ $\forall c \in (0,1)$.
To the best of our knowledge, among the models studied in the existing machine-learning literature, only \cite{tamuz2011adaptively} have proposed such a scale-invariant choice model:
\begin{align}
	p^\text{CKL}_{\vx_i, \vx_j, \vx_t} = \frac{||\vx_j - \vx_t||^2}{||\vx_i - \vx_t||^2 + ||\vx_j - \vx_t||^2}. \label{ckl}
\end{align}
A shortcoming of (\ref{ckl}) is its high sensitivity to the ``curse of dimensionality'': the probability of error grows quickly with $d$. 
Indeed, for a fixed target point and two query points sampled uniformly at random from a ball around the target, the predicted probability of the closest point to be chosen by the oracle (\ref{ckl}) decays to 1/2 for $d \rightarrow \infty$.
This makes comparison-based searching difficult, because most queries would provide almost no information about the target's location.
We propose a simple generalization of (\ref{ckl}) that addresses this shortcoming:
\begin{align}
\label{eq:gammackl_lucas}
	p_{\vx_i, \vx_j, \vx_t} = \frac{||\vx_j - \vx_t||^\gamma}{||\vx_i - \vx_t||^\gamma + ||\vx_j - \vx_t||^\gamma}, 
\end{align}
with $\gamma > 0$.
The parameter $\gamma$ controls the power of the oracle independently of the embedding dimension $d$.
To see this, note that when $\gamma$ is fixed and $d \to \infty$, the probability (\ref{eq:gammackl_lucas}) for a uniformly selected pair of points $\vx_i, \vx_j$ goes to 1/2. 
On the other hand, when $\gamma \to \infty$ and $d$ is fixed, this probability becomes an indicator function $p_{\vx_i, \vx_j, \vx_t}\to \mathcal{I} \left\{ ||\vx_i - \vx_t|| < ||\vx_j - \vx_t|| \right\}$. 
This suggests that as the dimension $d$ of the space grows, the new model should enable us to control the average outcome bias by scaling the parameter $\gamma$ accordingly. In the following theorem, we show that a linear scaling relationship between $\gamma$ and $d$ achieves this: %

\begin{theorem}
\label{gamma-d}
	Consider a $d$-dimensional ball $\mathcal{B} \subset \R^d$ of radius 1. Let the target point $\targetx$ be the center of $\mathcal{B}$. 
 For two points $\vx_a,\vx_b$ sampled uniformly from $\mathcal{B}$, let $p_Q$ be the probability of the correct answer on a query $Q = (\vx_a,\vx_b)$ given the target $\targetx$.
        For any $c_2 \in [\frac{1}{2}, 1]$ there is a constant $c_1 > 0$ such that if $\gamma$ grows linearly with $d$, $\gamma = c_1 \embdim + o(\embdim)$, then $p_Q \rightarrow c_2$.
\end{theorem}

We provide some intuition on a condition for the geometric structure of a set of queries to be rich enough to identify the target $\vx_t \in \R^d$ under the $\gamma$-CKL model. 
In particular, the following proposition (proven in the appendix) establishes an identifiability condition of the target $\vx_t$ for a finite set of queries $\hat{\mathcal{Q}} = \{\hat{Q}_1, \hat{Q}_2, \dots, \hat{Q}_L \}$ for which the exact answer probabilities are known (or alternatively, that are each repeated infinitely many times so that the answer probabilities can be exactly estimated). 
Each query constrains the locus of $\vx_t$ to a $d-1$-dimensional sphere; if these spheres intersect in only one point, it is at $\vx_t$ (cf. Fig. \ref{fig:spheres-plot}).

\begin{proposition}
\label{spheres}
	Assume that $\Omega \subset \R^d$ is $d$-dimensional compact set and that the target $\vx_t$ is sampled uniformly at random from $\Omega$. Consider an infinite sequence of queries $\mathcal{Q} = \{ Q_0,Q_1,\dots \}$ that is asked to the oracle, where each $Q_i \in \hat{\mathcal{Q}} = \{\hat{Q}_1, \hat{Q}_2, \dots, \hat{Q}_L \}$ and each $\hat{Q}_i$, $i=1,2,\dots,L$, is repeated infinitely many times. Also for each $\hat{Q}_i = (\hat{\vx}^a_i, \hat{\vx}^b_i)$ let $c_i = \|\hat{\vx}^a_i - \vx_t\|/\|\hat{\vx}^b_i - \vx_t\|$ and $\hat{\vz}_i = (c_i \hat{\vx}^b_i - \hat{\vx}^a_i)/(1 - c_i)$. If $\hat{\mathcal{Q}}$ satisfies rank$(\mZ) = d$, where $\mZ$ is the $d \times (L-1)$ matrix of vectors $\{ (\hat{\vz}_i - \hat{\vz}_L) : \hat{Q}_i \in \hat{\mathcal{Q}}, i = 1,\dots,L-1 \}$, then $\argmax_{\vx \in \Omega} \E [p(\vx \mid Y_{1:m})] \to \vx_t$ as $m \to \infty$.
\end{proposition}

A natural question to ask is: Is the scale-free model we propose a reasonable proxy for human comparisons? 
In Section~\ref{sec:eval_datasets}, we study this question empirically by using several real-world datasets.
We learn latent embeddings by maximizing the product of likelihoods (\ref{eq:gammackl_lucas}) on a training set, and evaluate the accuracy on a hold-out set.
We answer the question in the affirmative, and find that the addition of the $\gamma$ parameter enables our model to perform favorably when compared to other commonly used choice models.

In the next section, we first focus on the search problem, and assume that item embeddings %
are known.