\input{supp-model}

\section{Full proofs for exponential convergence}

\begin{proof}[Proof for Lemma \ref{lemma:region_rw}]
Let $\mathcal{D}$ be the decision made by the algorithm. This is a random variable which can take values in \((B, P_1, P_2, ..., P_{5^\embdim})\), for backtracking or proceeding to one of the children of \(\region\).
When arriving at a region $\region$, the algorithm discards information about previous query outcomes.
Then it asks a series of queries, prescribed by our Inner Loop algorithm.
Once the query outcomes have been observed, the decision is deterministic.
Therefore, to describe the distribution of $\mathcal{D}$ it is sufficient to describe the distribution of queries and query outcomes.
The queries that we ask depend only on the current region.
The outcomes are conditionally independent, given a target location.
Therefore, the distribution of $\mathcal{D}$ only depends on the current region and the latent $\targetx$. 
This means that the decision to proceed or backtrack is Markovian.
The sequence of regions $\region_\searchstep$ is a random walk.
\end{proof}

\begin{proof}[Proof for Lemma \ref{lemma:stoch_upper_bound}]
    We show an equivalent statement:
    There exists a coupling $\tilde{\region}_\searchstep$ and $\tilde{\rwbd}_\searchstep$, such that $\forall s\geq 0: \mathbb{P}[\z{\tilde{\region}_\searchstep}{ \targetx} \leq \tilde{\rwbd}_\searchstep] = 1$ and the distributions are identical, $F_{\region_\searchstep} = F_{\tilde{\region}_\searchstep}, F_{\rwbd_\searchstep} = F_{\tilde{\rwbd_\searchstep}}$
    This is done via induction.
    
    \textit{Induction start:}\\
    For $s=0$ we are looking at a constant, which is the same in both cases: $\rwbd_0 = \z{\region_0}{\targetx}$.
    Immediately $\mathbb{P}[\z{\region_0}{\targetx} = \tilde{\rwbd}_0] = 1$
    
    \textit{Induction step:}\\
    We are given a random variable $\mathcal{\tilde{\region}}_\searchstep$ which has the same distribution as $\region_\searchstep$
    and we know that $\mathbb{P}[\z{\tilde{\region}_\searchstep}{\targetx} \leq \tilde{\rwbd}_\searchstep]=1$.
    We will now construct two random variables $\mathcal{\tilde{\region}}_{\searchstep+1}$ and $\tilde{\rwbd}_{\searchstep+1}$ for which it holds that $\mathbb{P}[\z{\tilde{\region}_{\searchstep+1}}{\targetx} \leq \tilde{\rwbd}_{\searchstep+1}]=1$.
    
    
    Let $u \sim \uncertain[0,1]$ be a sample from the uniform distribution on $(0,1)$.
    We use this to couple the two random walks.
    Depending on $u$ and the current state of the random walk is $\tilde{\region}_\searchstep$, the following transition is taken:
    \begin{itemize}
      \item $\tilde{\region}_\searchstep$ is green. This means $\z{\tilde{\region}_{\searchstep}}{\targetx} =0$
       \begin{itemize}
        \item if $u \leq \pdnrt(\tilde{\region}_\searchstep, \targetx)$, then proceed to a green child. This means $\z{\tilde{\region}_{\searchstep+1}}{\targetx}=0$ 
        \item if $\pdnrt(\tilde{\region}_\searchstep, \targetx) < u \leq \pdnrt(\tilde{\region}_\searchstep, \targetx) + \puprt(\tilde{\region}_\searchstep, \targetx)$, then backtrack to the parent region. This means $\z{\tilde{\region}_{\searchstep+1}}{\targetx}=0$
        \item else $\pdnrt(\tilde{\region}_\searchstep, \targetx) + \puprt(\tilde{\region}_\searchstep, \targetx) < u \leq 1$, stray to a red child region. Now we have $\z{\tilde{\region}_{\searchstep+1}}{\targetx}=1$ 
      \end{itemize}
      \item $\tilde{\region}_\searchstep$ is red. This means $\z{\tilde{\region}_{\searchstep}}{\targetx} >0$
      \begin{itemize}
        \item if $u \leq \pupwr(\tilde{\region}_\searchstep, \targetx)$, then backtrack to the parent region. This means $\z{\tilde{\region}_{\searchstep+1}}{\targetx} - \z{\tilde{\region}_{\searchstep}}{\targetx} = -1$
        \item if $\pupwr(\tilde{\region}_\searchstep, \targetx) \leq u <\precover\tilde{\region}_\searchstep, \targetx)  + \pupwr(\tilde{\region}_\searchstep, \targetx)$,then recover by proceeding to a green child region. 
        This means that $\z{\tilde{\region}_{\searchstep+1}}{\targetx}=0$.
        Recovering is only possible if one of the child regions contains the target.
        We know that the parent of a region is a superset of all the child regions $\parent{\region} \supset \bigcup_{\child \in \children{\region}} \child$.
        Therefore, whenever a recovery transition is possible, backtracking must likewise lead to a green region.
        This shows that recovery is only possible when $\z{\tilde{\region}_{\searchstep}}{\targetx}=1$.
        Therefore we have shown that $\z{\tilde{\region}_{\searchstep+1}}{\targetx} - \z{\tilde{\region}_{\searchstep}}{\targetx} = -1$
       \item else $\precover\tilde{\region}_\searchstep, \targetx)  + \pupwr(\tilde{\region}_\searchstep, \targetx) \leq u \leq 1$, then proceed to a red child. This means $\z{\tilde{\region}_{\searchstep+1}}{\targetx} - \z{\tilde{\region}_{\searchstep}}{\targetx} \in \{0,1\}$
      \end{itemize}
    \end{itemize}


    We now construct a coupled variable $\tilde{D}$ such that $\tilde{\rwbd}_{\searchstep+1} = \tilde{\rwbd}_s + \tilde{D}$.
    Since $\tilde{\rwbd}$ is a random walk on natural numbers, with a self-loop at 0, we need to distinguish between two scenarios:
    \begin{itemize}
      \item $\tilde{\rwbd}_s > 0$
      \begin{itemize}
        \item if $u \leq \frac{1+b}{2}$, then $\tilde{D}=-1$
        \item else, $\tilde{D}=1$
      \end{itemize}
      \item $\tilde{\rwbd}_s = 0$
      \begin{itemize}
        \item if $u \leq \frac{1+b}{2}$, then $\tilde{D}=0$
        \item else, $\tilde{D}=1$
      \end{itemize}
    \end{itemize}

    We will now show that the construction of $\tilde{D}$ ensures that $\mathbb{P}[\z{\tilde{\region}_{\searchstep+1}}{\targetx} \leq \tilde{D} + \tilde{\rwbd}_s]=1$

    \textbf{Case 1}, $\tilde{\rwbd}_s = 0$:
    
    The induction assumptions imply $\z{\tilde{\region}_s}{\targetx} = 0$, which in turn implies that $\mathcal{\tilde{\region}}_s$ is a green region.
    In this case we know that $\tilde{D} \in \{0, 1\}$ and $\z{\tilde{\region}_{\searchstep+1}}{\targetx}\in \{0,1\}$.

    It holds that $\z{\tilde{\region}_{\searchstep+1}}{\targetx} =1 $ iff $\pdnrt(\tilde{\region}_\searchstep, \targetx) + \puprt(\tilde{\region}_\searchstep, \targetx) = 1 - \pstray(\mathcal{\tilde{\region}}_\searchstep, \targetx) < u$.
    It holds that $\tilde{D} = 1$ iff $\frac{1+b}{2} < u$.
    From assumption \ref{assump:bias} we know that for all possible regions $\region$ and targets $\targetx$, $1-\pstray(\region, \targetx) > \frac{1+b}{2}$.
    Therefore $\tilde{D} = 0 \implies \z{\tilde{\region}_{\searchstep+1}}{\targetx} = 0$.
    Therefore  $\mathbb{P}[\z{\tilde{\region}_{\searchstep+1}}{\targetx} \leq \tilde{D} + \tilde{\rwbd}_s \mid \tilde{\rwbd}_s = 0]=1$
    
    \textbf{Case 2},  $\tilde{\rwbd}_s > 0, \z{\tilde{\region}_{\searchstep}}{\targetx}=0$:

    Again, $\tilde{\region}_s$ is a green region.
    So we know that $\z{\tilde{\region}_{\searchstep+1}}{\targetx} \in \{0,1\}$,
    and $\z{\tilde{\region}_{\searchstep+1}}{\targetx} =1 $ iff $\pdnrt(\tilde{\region}_\searchstep, \targetx) + \puprt(\tilde{\region}_\searchstep, \targetx) = 1 - \pstray(\mathcal{\tilde{\region}}_\searchstep, \targetx) < u$.

    Since $\tilde{\rwbd}_s>0$ and $\z{\tilde{\region}_{\searchstep}}{\targetx}=0$ we know $\tilde{D}\geq 0 \implies \z{\tilde{\region}_{\searchstep+1}}{\targetx} \leq \tilde{\rwbd}_s + \tilde{D}$.
    We only need to analyse the case of $\tilde{D}=-1$.
    We know that $\tilde{D}=-1 \implies u < \frac{1+b}{2}$.
    Using assumption \ref{assump:bias}, $\z{\tilde{\region}_{\searchstep+1}}{\targetx} = 1 \implies u > 1-p_s(\tilde{\region}_\searchstep, \targetx) >  1 - \frac{1-b}{2} = \frac{1+b}{2}$.
    This is a contradiction. We now know that $\tilde{D}=-1 \implies \z{\tilde{\region}_{\searchstep+1}}{\targetx} = 0$.
    Therefore  $\mathbb{P}[\z{\tilde{\region}_{\searchstep+1}}{\targetx} \leq \tilde{D} + \tilde{\rwbd}_s \mid \tilde{\rwbd}_s > 0, \z{\tilde{\region}_{\searchstep}}{\targetx}=0]=1$

    \textbf{Case 3}, $\tilde{\rwbd}_s > 0, \z{\tilde{\region}_{\searchstep}}{\targetx}>0$:

    $\tilde{D}=-1$ implies $u < \frac{1+b}{2}$.
    From Assumption \ref{assump:bias} we know that $\forall \region, \targetx : \frac{1+b}{2} \leq \pupwr(\region_\searchstep, \targetx) + \precover\region_\searchstep, \targetx)$.
    Therefore the event $\tilde{D} = -1$  implies $\z{\tilde{\region}_{\searchstep+1}}{\targetx} - \z{\tilde{\region}_{\searchstep}}{\targetx} = -1$.    
    Therefore  $\mathbb{P}[\z{\tilde{\region}_{\searchstep+1}}{\targetx} \leq \tilde{D} + \tilde{\rwbd}_s \mid \tilde{\rwbd}_s > 0, \z{\tilde{\region}_{\searchstep}}{\targetx}>0]=1$   


We have shown that $\mathbb{P}[\z{\tilde{\region}_{\searchstep+1}}{\targetx} \leq \tilde{D} + \tilde{\rwbd}_s]=1$   
\end{proof}

    
\begin{proof}[Proof for Lemma \ref{lemma:number_of_errors}]
    Under the assumption \ref{assump:bias} we have shown $\z{\region_s}{\targetx} \preceq_{st.} \rwbd_\searchstep$.
    The definition of stochastic ordering  $\mathbb{P}[\z{\region_s}{\targetx} \geq k] \leq \mathbb{P}[\rwbd_\searchstep \geq k]$
    is equivalent to $\mathbb{P}[\z{\region_s}{\targetx} \leq k] \geq \mathbb{P}[\rwbd_\searchstep \leq k]$.

    We will now show the claim of the lemma for $\rwbd_\searchstep$, the same statement for $\z{\region_s}{\targetx}$ follows immediately.
  Our proof is an induction for $\mathbf{P}[\rwbd_\searchstep > k] \le (\frac{1-b}{1+b})^k$.
  The property holds trivially for $s = 0, k \ge 0$ and $k = 0, s \ge 0$.
  Assume that the property holds for a given $s$ and for all $k$.
  For any $k \ge 1$, we have
  \begin{align*}
  &\mathbf{P}[\rwbd_{\searchstep+1} > k]
      = \frac{1+b}{2} \mathbf{P}[\rwbd_\searchstep > k + 1] + \frac{1-b}{2} \mathbf{P}[\rwbd_\searchstep > k - 1] \\
      &\le\frac{1+b}{2} (\frac{1-b}{1+b})^{k+1} + \frac{1-b}{2} (\frac{1-b}{1+b})^{k-1}
       = (\frac{1-b}{1+b})^k
  \end{align*}
\end{proof}



\begin{proof}[Proof for Lemma \ref{lemma:stray_time}]
Let $\tau_{\rwbd=N} = \inf \{s>0 \mid \rwbd_\searchstep, \rwbd_0 = N\}$ be the stopping time of $\rwbd_\searchstep$ reaching 0, starting from $N$.
We have shown that the random walk $\rwbd$ can be used as a stochastic upper bound.
Therefore we know $\mathbb{E}[\tau_\region] \leq \mathbb{E}[\tau_{\rwbd=1}]$.

We will now calculate the stopping time of $\rwbd$.

We ascertain that this random walk is ergodic \cite{levin2017markov}.
Since \(b>0\) the random walk is positive recurrent. The self-loop at \(\rwbd=0\) makes it aperiodic. It is irreducible.

Therefore it has a unique stationary distribution \(\pi\). We now calculate \(\pi\). The conditions on the distribution are:
\begin{align*}
&(1), \pi_0 = \frac{1+b}{2}\pi_0 + \frac{1+b}{2}\pi_1\\
&(2), \pi_1 =\frac{1-b}{2}\pi_0 + \frac{1+b}{2}\pi_2\\
&(3), \pi_n = \frac{1-b}{2}\pi_{n-1} + \frac{1+b}{2}\pi_{n+1}, n>1\\
&\sum_{i=0}^\infty \pi_i = 1
\end{align*}

We show \(\pi_n = (\frac{1-b}{1+b})^n\pi_0, n>0\) by induction:
\begin{align*}
&(1) \iff \pi_1 = \frac{1-b}{1+b}\pi_0\\
&(1) \& (2) \iff \pi_1 = \frac{1-b}{2}\pi_0 + \frac{1+b}{2}\pi_2 \\
&\iff \frac{2}{1+b}\pi_1 - \frac{1-b}{1+b}\pi_0 = \pi_2 \iff \pi_2 = (\frac{1-b}{1+b})^2 \pi_0\\
& (3) \iff \pi_{n-1}=\frac{1-b}{2}\pi_{n-2}+\frac{1+b}{2}\pi_n\\
& \iff (\frac{1-b}{1+b})^{n-1}\pi_0 - \frac{1-b}{2} (\frac{1-b}{1+b})^{n-2}\pi_0 = \frac{1+b}{2}\pi_n\\
& \iff (\frac{1-b}{1+b} - \frac{1-b}{2})(\frac{1-b}{1+b})^{n-2}\pi_0 = \frac{1+b}{2}\pi_n\\
& \iff (\frac{1-b}{1+b}\frac{2}{1+b} - \frac{1-b}{2}\frac{2}{1+b})(\frac{1-b}{1+b})^{n-2}\pi_0 = \pi_n\\
& \iff (\frac{2-2b-1+b^2}{(1+b)^2})(\frac{1-b}{1+b})^{n-2}\pi_0 = \pi_n\\
& \iff (\frac{1-b}{1+b})^{n}\pi_0 = \pi_n\\
\end{align*}

From the infinite sum we get \(\sum_{i=0}^\infty \pi_i = \pi_0 \frac{1}{1 - \frac{1-b}{1+b}}= \pi_0 \frac{b+1}{2b}\).
Therefore: 
\(\pi_0 = \frac{2b}{b+1}\).

We can use the unique stationary distribution of \(\rwbd\) to compute expected return times.
As defined above, let $\tau_{\rwbd=N} = \inf \{s>0 \mid \rwbd_\searchstep=0, \rwbd_0 = N\}$.
We know that for a unique stationary distribution, the expected inter-arrival time for state 0 is
 \(\mathbb{E}[\tau_{\rwbd=0}] = \frac{1}{\pi_0}=\frac{b+1}{2b}\).
We are interested in $\mathbb{E}[\tau_{\rwbd=1}]$.
Starting from \(\rwbd=0\) the walk must either follow the self loop or go to $\rwbd=1$.
This leads to the following equation:
\begin{align*}
\frac{1+b}{2} + \frac{1-b}{2} (\mathbb{E}[\tau_{\rwbd=1}]+1) &= \frac{1}{\pi_0} = \frac{b+1}{2b}\\
\implies \frac{1-b}{2} \mathbb{E}[\tau_{\rwbd=1}] &= \frac{b+1}{2b} - \frac{1-b}{2} \\
&+  \frac{1+b}{2} = \frac{b+1}{2b}\\
\implies \mathbb{E}[\tau_{\rwbd=1}] &=(\frac{1}{b})
\end{align*}
\end{proof}



\begin{proof}[Proof for Theorem \ref{theorem:exp_v2}]

The first part of the claim follows immediately from Lemma \ref{lemma:number_of_errors}.
At any time $s$, the probability of needing more than $k$ backtracks until we reach a green region from $\region_s$ is less than $(\frac{1-b}{1+b})^k$.
We solve $\delta = (\frac{1-b}{1+b})^k$ for $k$.
Then we know that with probability of at least $1 - \delta$, the target must be in the $k$-th ancestor of $\region_s$.
This is the region that we propose as the result of our search process.

We now need to show that the expected depth of this region increases at a constant rate.
Since $k$ is a constant that only depends on the desired rate of error $\delta$ and does not change over time, it suffices to show that the expected depth of $\region_s$ increases at a constant rate.

We make use of the Markovian property of $\region_s$.
Without loss of generality, we assume that we are currently at time $s=0$.
Additionally we assume that the current region $\region_0$ is green.
When the execution of the algorithm begins, this is true since $ \targetx \in \Omega$.

We now define a stopping time $s' = \inf \{s>0 \mid \targetx \in \region_\searchstep, \region_0\}$, as the next time at which our algorithm visits a green region.
We will show that this stopping time is finite and that this next green node is, in expectation, at a higher depth.
The analysis then becomes recursive.
Specifically, we will show:
\begin{itemize}
  \item There is a constant \(C_d>0\), such that $\mathbb{E}[D(\region_{s'})-D(\region_0)]>C_d$
  \item There is a constant $C_s<\infty$, such that $\mathbb{E}[s'] < C_s$
\end{itemize}

Starting from the green region $\region_0$, the following transitions are possible:
\begin{itemize}
  \item With probability $\pdnrt(\region_0, \targetx)$, the search proceeds to a green child. In this case we stop immediately, $s' = 1$ and $D(\region_1) - D(\region_0) = 1$.
  \item With probability $\puprt(\region_0, \targetx)$, the search backtracks. Since the parent of a green region must be green as well, we also stop immediately, $s' = 1$.
  Since backtracking looses two levels of depth, we have $D(\region_1) - D(\region_0) = -2$.
  \item With probability $\pstray(\region_0, \targetx)$, the search strays.
\end{itemize}

The last case requires further analysis. 
Following Lemma \ref{lemma:stray_time}, we know that the expected stopping time after straying is upper bounded by $\mathbb{E}[\tau_{\rwbd=1}]=\frac{1}{b}$.
Every backtracking decision must always undo at least one proceed decision.
This means that, in the worst case scenario, exactly half the steps until $s'$ are proceed and half are backtrack decisions.
A pair of proceed and backtrack decisions first gains one level of depth and then looses two.
Therefore, conditioned on the assumption that we have left $\region_0$ by straying, the expected new depth is bounded by
 $\mathbb{E}[D(\region_{s'})-D(\region_0) | \text{we strayed from $\region_0$}] < -1 \frac{1}{2} (1+ \mathbb{E}[\tau_{\rwbd=1}]) = -\frac{1}{2} (1 + \frac{1}{b}) = -\frac{b+1}{2b}$.

 In expectation, the number of timesteps that passes between consecutive green regions is $\mathbb{E}[s'] \leq \puprt + \pdnrt + \pstray (1 + \mathbb{E}[\tau_{\rwbd=1}]) =  \puprt + \pdnrt + \pstray \frac{b+1}{b}$.
 This means at time \(s\) we have, in expectation, visited \(\frac{s}{\puprt + \pdnrt + \pstray \frac{b+1}{b}}\) green nodes.

  The expected depth of each consecutive green node is \(\pdnrt - 2\puprt - \pstray \frac{b+1}{2b}\) levels higher than its predecessor. Due to Assumption \ref{assump:depth_bias} we know that this is strictly positive.


In expectation, the last green node that we have visited is at a depth of \(\frac{s}{\puprt + \pdnrt + \pstray \frac{b+1}{b}} \left(\pdnrt - 2\puprt - \pstray \frac{b+1}{2b}\right)\).
We also know an upper bound on the expected number of steps between green nodes: For any given state $\region_s$ of the search algorithm, we know that in expectation, we have taken at most
$\mathbb{E}[s'] \leq \puprt + \pdnrt + \pstray \frac{b+1}{b}$ steps since the last green region.
We are interested in a bound of the depth of the current region. In the worst case scenario, all of these steps were backtracks.
This leads to
$\mathbb{E}[D(\region_s)] \geq \frac{s}{\puprt + \pdnrt + \pstray \frac{b+1}{b}} \left(\pdnrt - 2\puprt - \pstray \frac{b+1}{2b}\right) - 2 (\puprt + \pdnrt + \pstray \frac{b+1}{b})$
(which is a linear function of $s$).
\end{proof}



\begin{proof}[Proof for Lemma \ref{lemma:hyptest}]

  Let $x_q = (1+d)\mathbf{e}$.
  We denote the probability of the query point inside of $\region$ being preferred as $\mathbb{P}[\vec{0} \succ x_q | \targetx] = \mathbb{P}[\region \succ F | \targetx]$
  
  We will now show that there are two probabilities $p_\region > p_\far>0$ such that:
    \begin{itemize}
      \item \(\targetx \in \region \implies \mathbb{P}[\region \succ F | \targetx] \geq p_\region\)
      \item \(\targetx \in \far \implies \mathbb{P}[\region \succ F | \targetx] \leq p_\far\)
    \end{itemize}
  This immediately allows the use of a binomial test.
  Any level of accuracy is possible, we simply need to repeat the query often enough.
  
  The target location inside \(\region\) for which \(\mathbb{P}[\region \succ F | \targetx]\) is smallest is 
  \(\mathbf{x}_c = \argmin_{\targetx \in \region} \mathbb{P}[\region \succ F | \targetx] = \mathbbm{1}\).
  We call this point \(\mathbf{x}_c\) since it lies in a corner of the hypercube. A formal proof that the minimum is found at $\mathbf{x}_c$ is derived with sympy \cite{meurer2017sympy} and included in the supplementary code.
  For any parametrization of \(\gamma-CKL\) (or another scale-free oracle model) we can now explicitly calculate the lower bound: \(p_\region =\mathbb{P}[\region \succ F | \targetx = \mathbbm{1}]\).
  
  We define the following distances:
  \begin{align*}
    d_{c} &= ||\vec{0} - \mathbf{x}_c||=\sqrt{\embdim}\\
    d_{qc} &= ||\mathbf{x}_q - \mathbf{x}_c||=\sqrt{\embdim^2 + \embdim-1}\\
    d_{q} &= ||\vec{0} - \mathbf{x}_q||=\embdim + 1
  \end{align*}
  
  The ratio of distances between \(\mathbf{x}_c\) and the two query points is 
  \(\frac{||\vec{0} - \mathbf{x}_c||}{||\mathbf{x}_q - \mathbf{x}_c||} = \frac{d_{c}}{d_{qc}}\).
  We know that any point \(\mathbf{x}'\) that induces the same outcome probability must have the same ratio of distances: \(\mathbf{P}[\vec{0} \succ \mathbf{x}_q | \targetx=\mathbf{x}'] = \mathbf{P}[\vec{0} \succ \mathbf{x}_q | \targetx=\mathbf{x}_c] \iff \frac{||\vec{0} - \mathbf{x}'||}{||\mathbf{x}_q - \mathbf{x}'||} = \frac{d_{c}}{d_{qc}}\).
  
  Out of these points, the one with the least distance to \(\vec{0}\) lies on the line segment between \(\vec{0}\) and \(\mathbf{x}_q\). %
  The point with the largest distance to \(\vec{0}\) lies on the ray from \(\mathbf{x}_q\) to \(\vec{0}\),
  at \(\vec{0} - \mathbf{e} \frac{\embdim + \sqrt{\embdim^{3} + \embdim^{2} - \embdim}}{\embdim - 1}\).
  This can be found by solving the condition of equal ratio for the x coordinate. 
  A full derivation in sympy can be found in the supplementary material.
  We know that all points $\mathbf{x}'$ with a ratio that is strictly larger than \(\frac{d_{c}}{d_{qc}}\) must induce a smaller probability \(\mathbf{P}[\vec{0} \succ \mathbf{x}_q | \targetx=\mathbf{x}'] < p_\region\).
  The point farthest away from $\vec{0}$ which still has this ratio lies at $\vec{0} - \mathbf{e} \hat{r}$, with \(\hat{r} = \frac{d + \sqrt{d^{3} + d^{2} - d}}{d - 1}\).
  This allows us to specify an uncertainty region.
  Let \(\radiusU = \hat{r} + 1\).
  The probability \(p_\far\) can now be explicitly computed (for any parametrization of \(\gamma\)-CKL or other scale-free oracle models) as 
  \(p_\far = \mathbf{P}[\vec{0} \succ \mathbf{x}_q | T = \vec{0} - \mathbf{e} (\hat{r} + 1)] < p_\region\)
  
  
  \end{proof}

\begin{proof}[Proof for Lemma \ref{lemma:bbox}]
Let $\boundingbox'$ be a hypercube, centered at $\targetx$ and with edge length $2\frac{1}{16} = \frac{1}{8}$.
If a cell $\cell{k}$ is in class (A) or (B) then its center $\mathbf{x}_{\cell{k}}$ must lie in the hypercube $B$.
We now extend the edge length of this hypercube to fully contain any cell whose center lies in the hypercube.
Let $\boundingbox$ be a hypercube, centered at $\targetx$ and with edge length $2\frac{1}{16} + r_c$.
It follows immediately that $\bigcup_{\cell{k} \text{ has class (B) or (A)}} \cell{k} \subseteq \boundingbox$.
We have chosen $r_c < \frac{1}{8\radiusU} < \frac{1}{8}$.
Therefore it follows that the edge length of $\boundingbox$ is $2\frac{1}{16} + r_c < \frac{1}{8} + \frac{1}{8} = \frac{1}{4}$.
\end{proof}


\begin{proof}[Proof for Theorem \ref{theorem:innerloop_main}]
The tiling $\tiling{\slack}{r_c}$ contains $K$ cells, this is also the number of hypothesis tests that we conduct.
Conditional on $\targetx$, the oracle replies are independent, and therefore the test outcomes are independent.
We assume that the probability of error for any one of the tests is $\delta$.
The probability of no error occuring across all tests is therefore $(1-\delta)^K$.
We need $\hat{\delta} =1- (1-\delta)^K$.
Lemma \ref{lemma:hyptest} ensures that we can adjust the hypothesis test for any desired probability of error.
It is therefore always possible to choose a number of query observations (depending on the dimensionality and the parameters of the choice model) that leads to the desired $\hat{\delta}$.

In the following we assume that all hypothesis tests have provided correct information.
This means, that (H) has not been rejected for the cells in class (A) and it has been rejected for the cells in class (C).
We create a bounding box $\mathcal{B}$ around all cells for which hypothesis (H) has not been rejected.
From Lemma \ref{lemma:bbox} we know that this bounding box has an edge length of at most $\frac{1}{4}$.
We now look at all possible locations for $\targetx$ and verify that the decision criterion must lead to a correct decision.

\textbf{Case 1}, $\targetx \notin (\slack \cup \region)$:

The bounding box can't overlap with $\region$. Therefore we backtrack. This is the correct decision.

\textbf{Case 2}, $\targetx \in \region$:

There is a cell in class (A), which overlaps with $\region$.
For this cell, the hypothesis (H) has not been rejected.
This means that the bounding box must overlap with $\region$.
Also, we know that the bounding box has an edge length of less than $\frac{1}{4}$.
This means that it can overlap with at most 2 of the tiles in $\tiling{\slack}{1/4}$.
This means that there is a child region in $\children{\region}$ which fully contains the bounding box.
Our decision criterion proceeds to this child.
And we know that the bounding box must contain the target (since we're assuming that all hypothesis tests have returned correctly).
This ensures that we are proceeding to a green region.

\textbf{Case 3}, $\targetx \in \slack$:

The target is not in the current region, i.e. $\region$ is a red region.
If the bounding box happens to not overlap with $\region$, we backtrack, which is considered a correct decision.
If the bounding box happens to overlap with $\region$, then we know that there must be a child which fully contains the bounding box.
Our decision criterion proceeds to this child region.
We also know that the bounding box contains the target. So we are proceeding to a green region.
This is a recovery transition, and it is also considered a correct decision.

We have shown that, under the assumption that all hypothesis tests have provided correct information, the decision criterion leads to a correct transition.
Our assumption on the hypothesis tests holds with probability $1 - \hat{\delta}$.

If some hypothesis tests are erroneous, then we can see inconsistent behaviour.
For example, it is possible that the bounding box is too large, and overlaps with multiple child regions, or overlaps with both $\region$ and $\Omega \setminus \slack$.
We assume that in this case, we backtrack.
This can be the wrong decision, but it will happen with at most probability $\hat{\delta}$.

\end{proof}
