\section{Properties of Phased Elimination}
% 1. what this section aims to prove (how and when phased elimination eliminates arms)
% 2. three points we want to prove  (accuracy of algorithm, Given accuracy it doenst eliminate optimal arm, given optimal arm isnt deleted bad arm is deleted, an independent set of arms is eliminated)

In order to build an efficient inverse algorithm, we must first understand how the forward algorithm works. Namely, this section analyzes how and when phased elimination eliminates arms. The times of eliminations for certain arms will help inform the inverse algorithm. Namely, we wish to prove four properties of the Phased Elimination Algorithm. 
\begin{enumerate}
    \item The forward algorithm has an accurate estimate of the true parameter $\theta^*$ at each time step with high probability.
    \item Given the forward algorithm accurately estimates $\theta^*$, it does not mistakenly eliminate the optimal arm $A^*$.
    \item Given the optimal arm is not deleted, a set of suboptimal arms is correctly eliminated at each phase.
    \item That set of eliminated suboptimal arms contains a linearly independent set of arms. 
\end{enumerate}

These lemmas build on each other into the final idea that a linearly independent set of arms are eliminated at each time step. Given that intuition, we can build our inverse estimator. The proofs of these lemmas and more thorough explanations are relegated to the appendix. 
Indeed, given how our G-Optimal design is chosen, we have that the estimation error of the forward algorithm is small for any phase.

\begin{restatable}[\textbf{Demonstrator's Estimation Error}]{lemma}{errorgoodterm}
\label{lem:error_good_term}
From \citep{batchedbandits}, given a G-Optimal design with probability parameter $\delta \leq \frac{\gamma}{\Psi_dL^3}$ and error parameter $\epsilon_l$ are chosen at each phase, with probability at least $1 - \frac{1}{L^2}$, for every $A \in \mathbb{A}_l$, we have $$\lvert \langle A, \hat{\theta}_l - \theta^* \rangle \rvert \leq \varepsilon_l\text{.}$$
\end{restatable}
\label{sec:phased_elim_props}
This accuracy of the forward algorithm's $\hat{\theta}_l$ helps maintain its low regret properties. This includes not eliminating arms with large rewardsd including the optimal arm. We demonstrate this to be the case in the following corrollary.
\begin{restatable}{corollary}{bestarmactive}
\label{corr:best_arm_active}
With probability $1 - \frac{1}{L^2}$, for every phase $l$, $A^* \in \mathbb{A}_l$.
\end{restatable}

On the other hand, the accuracy of the forward algorithm and the fact that it maintains the optimal arm in the active set helps eliminate poor arms accordingly. In fact, the phase at which an arm is eliminated is strongly connected to the suboptimality of that arm. This is formalized in \Cref{lem:sub_gets_deleted}.
\begin{restatable}[\textbf{Elimination of Suboptimal Arms}]{lemma}{subgetsdeleted}
    \label{lem:sub_gets_deleted}
    Let $l_A$ be the first phase such that the suboptimality gap drops below double the elimination criteria $l_A = \min\{l \text{ s.t. } 4\varepsilon_l \leq \nabla_A\}$. With probability at least  $1 - \frac{1}{L^2}$, arm $A$ will be deleted before phase $l_A$. 
\end{restatable}

Given this lemma, we can estimate an arm's true reward based on when the forward algorithm eliminates it. This intuition is what helps design our inverse estimator. We now make an important claim about this set of suboptimal arms that are deleted: \textit{it contains a linearly independent subset of arms}. Intuitively, we can find a linearly independent set of suboptimal actions where the estimated rewards are such that they will be eliminated at phase $l$. This claim is proved in \Cref{lem:linearly_independent}. 
\begin{restatable}[\textbf{Linearly Independent Set of Eliminated Arms}]{lemma}{linearlyindependent}
    \label{lem:linearly_independent}
    Given a set of eliminated arms $\mathbb{E}_l = \mathbb{A}_{l} \setminus \mathbb{A}_{l-1}$, we prove that we can select a subset of arms $\mathbf{A}_l$ from $\mathbb{E}_l$ such that it is linearly independent and spans $\mathbb{R}^d$ with probability at least $1-\frac{1}{L^2}$.
\end{restatable}
To give an intuition of where the arms in the linearly independent set of eliminated arms lie, we provide \Cref{cor:existence_of_arm}. We can visualize that the linearly independent set forms a cone around the optimal arm. In each direction associated with the simplex vertex $S_i$, we can lower bound how tight the edge of the cone is to the optimal arm, up to discretization factors from our assumption that $\gamma \leq \frac{2\epsilon_l}{\norm{\theta}_2}$. This intuition is natural from our smoothness assumption \Cref{ass:lip_smooth}.
% To give intuition of where this linearly independent set of arms lie, we provide the following lemma. It states that a linearly independent set of arms is found by taking the arm in $\mathbb{E}_l$ that is $\gamma$-close to $f(\beta, i)$ where $\beta \geq  \left[\frac{6*2^{-l}}{\mathbb{L}}\right]^{\frac{1}{\omega}}$ for every $i \in [d]$. For every $\beta$ satisfying this constraint and dimension $i$, such an arm in $\mathbb{E}_l$ is guaranteed to exist with high probability. This result is natural from \Cref{ass:lip_smooth} where the smoothness of the reward function states helps state where suboptimal arms will lie around the optimal arm. Here, we reiterate the importance of our density assumption, that  $\gamma \leq \frac{2\epsilon_l}{\norm{\theta}_2}$. This density assumption helps ensure that there always exists an action in the Eliminated Set close to a rotation of the optimal arm. This is key for the proof of \Cref{cor:existence_of_arm}. 
\begin{restatable}{lemma}{existenceofarm}
    \label{cor:existence_of_arm}
    For every index $i$, there exists a $\beta \geq \left[\frac{6*2^{-l}}{\mathbb{L}}\right]^{\frac{1}{\omega}}$ and arm $v \in \mathbb{E}_l$ such that $v$ is $\gamma$-close to $f(\beta, i)$ with probability at least $1 - \frac{1}{L^2}$.
\end{restatable} 

