\section{Information-Theoretic Lower Bound}
\label{Sec:Information-TheoreticLB}
We now provide an information-theoretic lower bound on the accuracy achievable by any inverse estimator via the classical Le Cam binary testing approach~\citep{lecam1973convergence}.
Essentially, this approach creates two different bandit instances $\mathcal{M}_1 = (\theta_1^*, \mathcal{A}_1)$ and $\mathcal{M}_2 = (\theta_2^*, \mathcal{A}_2)$ and has the forward algorithm work with one of these bandit instances.
Then, we show that the inverse algorithm will be unable to distinguish which of the bandit instances the forward algorithm interacted with given a single demonstration of \emph{any} forward algorithm that incurs regret at least  $\widetilde{\mathcal{O}}(\sqrt{dT})$ and sufficiently explores each direction. 
Since the fundamental limit on regret for stochastic linear bandits for finite action sets is known to be $\widetilde{\mathcal{O}}(\sqrt{dT})$~\citep{lattimore_szepesvári_2020}, this implies a fundamental limit on inverse estimation. Theorem~\ref{thm:lower_bound} is proved in \Cref{sec:appendlowerbound}.

% Intuitively, we need to upper bound the amount the forward algorithm can explore in the direction of $\theta - \theta'$. Given that Phased Elimination achieves the regret bound of $\mathcal{O}(\sqrt{dT})$, it cannot explore every suboptimal direction as much as the direction of the optimal arm. The more an algorithm can explore suboptimal directions, the more an inverse algorithm can look at the number of pulls in the direction of $\theta' - \theta$ to determine whether the reward parameter is $\theta$ or $\theta'$. The amount of which any forward algorithm can explore any direction while achieving a regret bound of $\mathcal{O}(\sqrt{dT})$ is described by \cite{Banerjee2022}. Intuitively, this is quantified by an upper bound on the eigenvalues.




\begin{restatable}{theorem}{lowerbound}
    \label{thm:lower_bound}
     For a bandit instance $\mathcal{M}$ characterized by reward parameter $\theta_1^*$ and action set $\mathcal{A}$, there exists a bandit instance $\mathcal{M}'$ with parameter $\theta_2^*$ and the same action set $\mathcal{A}$ such that any inverse estimator incurs error 
    $$\max\{\|\hat{\theta} - \theta_2^*\|_2, \|\hat{\theta} - \theta_1^*\|_2\} = \widetilde{\Omega}\left(\sqrt{\frac{d}{T}}\right) \text{.}$$
\end{restatable}
%
