\section{INTRODUCTION}
\label{sec:intro}
Searching a database in order to find a target item via some explicit query, such as one or several keywords, is a well-studied problem in information retrieval (IR).
However, depending on the data type, it can be difficult or inefficient to formulate an explicit query.
For example, the witness of a crime working with police does not sketch the face of a suspect; instead, she provides feedback on a sequence of images to gradually arrive at a faithful approximation of the suspect's face.
This is an example of {\em interactive comparison-based search}, where the user navigates towards the target item sequentially \citep{tschopp2011randomized,canal2019active,chumbalov2020scalable}.
In this approach, the user does not formulate an explicit query; rather, she answers a set of simple similarity queries with respect to the target: Among two items $i$ and $j$ provided by the system, which is closer to the intended target $\target$?
We refer to the outcome of such a query as a {\em triplet} $(i,j;t)$: among $\{i,j\}$, the user considered $i$ more similar to $t$.

The central component in such a system is a probabilistic oracle model that encapsulates how users answer such queries.
Most approaches, including ours, posit that items live in some low-dimensional feature space.
The embeddings of items $(\emb_1,\emb_2,\dots,\emb_\numitems)$ in this feature space determine the noisy outcomes of triplet queries.
Depending on the scenario, these embeddings can be derived from explicit item features (e.g., describing the geometry of a face), or they can be considered latent and estimated from past triplet data \citep{tamuz2011adaptively, van2012stochastic, chumbalov2020scalable}.
A {\em search algorithm} then presents a pair of items to the user, collects feedback, and repeats this process, until it can guess the target. %


\citet{chumbalov2020scalable} posit a {\em Probit} oracle model that assumes that the probability of answering $i$ or $j$ depends on the distance of $t$ from the bisecting hyperplane between $\emb_i$ and $\emb_j$, relative to a noise parameter $\sigma_e$. %
They develop a search algorithm that maintains a Gaussian belief distribution over the embedding space, which captures the current knowledge about $\emb_t$.
Each query maximizes the information gain relative to this belief distribution, until the target is guessed correctly.
A drawback inherent to their oracle model is that scaling to large $\numitems$ is problematic due to the assumptions underlying the Probit oracle: once the belief distribution starts concentrating, the information contained in queries decreases.
For example, suppose for exposition's sake that the algorithm has narrowed down the target to two candidates $\target'$ and $\target''$, and that the distance $\|\emb_{\target'}-\emb_{\target''} \|$ is small, relative to the noise parameter $\sigma_e$.
Then {\em any possible query pair} $(i,j)$ generates answers relative to $\target'$ and to $\target''$ that are nearly indistinguishable (i.e., they are Bernoulli random variables whose parameters are close).
This means that the rate at which the belief distribution concentrates around $\emb_\target$ decreases, slowing down progress of the search.
This leads to an unfavorable scaling of expected search cost when $\numitems$ grows large\footnote{We note that simply reducing $\sigma_e$ does not help, because this would make macroscopic queries too certain.}.







In this paper, we argue that it is plausible to assume a more favorable oracle model.
Specifically, we posit that the probability of choosing $i$ over $j$ is {\em scale-invariant} or {\em self-similar}, i.e.,
that it depends on the item embeddings only via the ratio of $\|\emb_{i}-\emb_\target\|$ to $\|\emb_{j}-\emb_\target\|$.
In other words, to compare two very dissimilar items with respect to a target that is very dissimilar from both, is no harder (nor easier) than to compare two quite similar items to a nearby target. 
There is some evidence that this model reflects some of the psychological laws in perception \citep{chater1999scale, laming1986sensory}, and we provide additional experimental evidence on this point in Section \ref{sec:evaluation}.



Under a perfectly scale-free oracle, the information required to halve the volume of the belief region does not depend on the scale of the current belief region.
This suggests that there is hope that this volume can shrink exponentially fast with the number of queries.
Indeed, a central contribution in this paper is an algorithm that achieves exponential convergence.
In the noisy setting we study, this is non-trivial, because there is always the possibility of errors in oracle answers, such that the current belief moves too far away from the target.
We solve this with a backtracking strategy that detects the occurrence of an error based on subsequent queries, and expands the belief region in order to ``recapture'' the target.
We prove the exponential convergence of the expected distance to the target via an equivalence of a biased random walk on an infinite graph, which captures the containment relationships among the family of belief regions available to the algorithm.






{\bf Related work.}
A number of different triplet comparison models were introduced and studied in the machine-learning literature.
Their main focus is on learning an embedding from comparison triplet data, which then allows predictions for unseen triplets.
In \cite{van2012stochastic}, the authors propose the t-STE model and capture the similarities between items via a Student-$t$ kernel, whose power-law tail confers robustness to outlier triplets.
This model shares the drawback of the Probit model in that a narrow query ($\|\emb_{i}-\emb_{j}\| \rightarrow 0$) provides varnishing information, regardless of the target location.
Later, \citet{amid2015multiview} generalize the idea of the t-STE model by allowing multiple representations of the same object in several different low-dimensional maps. 
The scale-invariant CKL model, introduced in \cite{tamuz2011adaptively}, corresponds to the special case $\gamma = 2$ of the model considered in this paper. 
The Probit model is explored in \cite{chumbalov2020scalable} and \cite{canal2019active}, where the output probability is a function of the distance between the target and the hyperplane bisecting the two query points.
For a thorough discussion and comparison of (both noisy and noiseless) comparison triplet oracles and the embedding techniques they induce, see \cite{vankadara2023insights}.



A number of papers consider the problem of searching for a target using a sequence of {\em noiseless} comparison queries.
Search via such comparison queries has been considered, for example, in \citet{dasgupta2005analysis,nowak2008generalized}.
\citet{karbasi2012comparison} assume that all distances between pairs of items are known.
Their analysis assumes either a noiseless oracle, or an oracle with constant error probability, independently of the distances between query items and target.
This uniform noise model is not a realistic assumption for most applications, because it essentially assumes that every query conveys the same amount of information, independently of $\emb_{i,j,t}$; if the true oracle is different, this assumption leads to inefficient search algorithms.
Extensions of this line of work include oracles with ternary output including ``I don't know'' for similar query items \citep{kazemi2018comparison}, and larger query sets from which the most similar item is selected \citep{karbasi2015small}. 
Although these approaches are similar in spirit to the problem considered here, the resulting search algorithms are not robust to noise, as they are unable to correct for incorrect query outcomes as the search progresses.
Finally, there exists a line of work where noiseless triplet queries are used for efficient nearest-neighbor search in high-dimensional spaces \citep{haghiri2017comparison}.



The problem of searching in a space with {\em noisy} similarity queries is studied in \cite{cox2000bayesian, fang2005experiments, ferecatu2007interactive, suditu2012iterative,garnett2012bayesian}, using different comparison models in a fully Bayesian framework. 
In order to find the next query to ask, these methods usually aim to maximize the information gain by performing an exhaustive search over all combinations of pairs of items, which becomes prohibitively expensive for large $n$. 
This unfavorable computational efficiency was addressed in \cite{canal2019active} and \cite{chumbalov2020scalable}, where the authors propose search schemes with more favorable tradeoffs between query and computational complexity by approximating the knowledge about $\emb_t$ with a parametric distribution, which results in much better scalability. 
Comparison queries have also been explored in other active-learning scenarios,  where, rather than finding one target item (or target point in a feature space), the goal is to determine a hypothesis function $h$ that assigns binary labels for all items in the database, assuming the two classes are separable by an unknown hyperplane \citep{kane2017active,nowak2009noisy}.


The remainder of this paper is structured as follows.
In Section \ref{sec:model}, we describe the $\gamma$-CKL model and explore the scaling relationship between $\gamma$ and the embedding dimension $\embdim$ with fixed error rate.
In Section \ref{sec:search}, we give the search algorithm for the dense case, i.e., when every point $\emb \in \Omega \subset \mathbb{R}^\embdim$ is a potential target.
We formally prove that this algorithm shrinks the expected distance to the target exponentially fast.
In Section \ref{sec:evaluation}, we compare $\gamma$-CKL against commonly used choice models on a series of comparison datasets and show the results of a comprehensive user study that validates the performance of the $\gamma$-CKL model. We also present synthetic experiments that confirm the exponential convergence rate of our new algorithm.

\begin{figure*}[htp]
    \centering
    \subfloat[One query $(\vx_a, \vx_b)$]{\includegraphics[width=4.5cm, height=4.5cm]{plots/spheres-0.png}} 
    \qquad
    \subfloat[rank($\mZ$) = 2]{\includegraphics[width=4.5cm, height=4.5cm]{plots/spheres-1.png}} 
    \qquad
    \subfloat[rank($\mZ$) = 1]{\includegraphics[width=4.5cm, height=4.5cm]{plots/spheres-2.png}}
    \caption{Illustration of the result of Proposition~\ref{spheres} in $\mathbb{R}^2$. (a) For each query the subset of points in $\Omega$ that maximizes the expected log-likelihood geometrically is a sphere containing $\vx_t$ with center $\vz$. (b) When the set of sphere centers $\{ \vz_i \}$ corresponding to the queries $\hat{\mathcal{Q}}$ span a volume in $\R^2$, these spheres intersect at exactly one point, $\vx_t$. (c) Otherwise, there are multiple points of intersection, and $\vx_t$ is not identifiable.}
    \label{fig:spheres-plot}
\end{figure*}





















