\section{Experiments}
\begin{figure*}[t!]
  \centering
  \subfigure[$\ell_1$ Ball.  Slopes of $-0.487, -0.491, -0.453$ for orange, blue, and green best-fit lines.]{\includegraphics[width=0.32\textwidth]{figures/l1_figure.pdf}\label{fig:l1 perf}}
  % \begin{subfigure}
  %     \includegraphics[width=0.32\textwidth]{figures/l1_figure.pdf}
  %     \label{fig:l1 perf}
  % \end{subfigure}
    \hfill
  \subfigure[$\ell_2$ Ball. Slopes of $-0.490, -0.435, -0.433$ for orange, blue, and green best-fit lines.]{\includegraphics[width=0.32\textwidth]{figures/l2_figure.pdf}
         \label{fig:l2 perf}}
        \hfill
    \subfigure[$\ell_5$ Ball. Slopes of $-0.455, -0.427, -0.413$ for orange, blue and green best-fit lines.]{\includegraphics[width=.32\textwidth]{figures/l5_figure.pdf}
         \label{fig:l5 perf}}
  \caption{The inverse estimator's performance (averaged over 100 trials) over the $\ell_1$, $\ell_2$, and $\ell_5$ balls across dimensions $d = 6,7,8$. The shaded region represents the standard deviation corresponding to each phase. Each graph is a log-log scale with orange, blue, and green dotted lines denoting a log-log linear fit for each dimension.}
  
     \label{fig:three graphs}
\end{figure*}
\label{sec:experiments}
To validate our results empirically, we implement our inverse estimator on both simulated and semi-synthetic environments, measuring the error in the estimate of $\theta^*$. To run Phased Elimination and our estimator on these action sets most naturally, we run the algorithm for a fixed number of phases rather than a fixed number of rounds; see \Cref{sec:practical_phased_elim} for a formal description.

% the forward algorithm is \Cref{{alg:practical_phased_elim}}, which is identical to Algorithm~\ref{alg:phased_elim} except that the algorithm is run for a fixed number of phases rather than a fixed number of rounds.

%Further details on the practical implementation of our inverse algorithm can be found in \Cref{sec:fastalg}.
% \subsection{Experiment Setup}
% We briefly explain the setup of our experiments. We wish to demonstrate that our inverse estimator is both accurate and sample-efficient. Over several experimental setups, we use our inverse learning algorithm to learn the reward function of the forward algorithm. We measure its accuracy and the number of samples from the forward algorithm required to achieve its accuracy. We see that our empirical results corroborate our theoretical results.  

\subsection{Simulations}
% \kri{rewrite as: To construct our action sets, we sample from $l_1$... balls, each action set has xyz number of actions, }
To construct our action sets, we sample $4000$ vectors from the surface of the unit $\ell_1$, $\ell_2$, and $\ell_5$ balls and use this finite set as $\mathcal{A}$. This is done by independently sampling each entry from a generalized Gaussian distribution (having density proportional to $\exp(-|x|^\beta)$) with a $\beta = 1,2$, and $5$ respectively, and then normalizing the resulting vector by its respective $\ell_p$ norm ~\citep{barthe2005probabilistic}. 
The noise in the observed reward is Gaussian with mean $0$ and variance $0.02$.

 Using the implementation in~\Cref{alg:practical_phased_elim}, we run $100$ trials of a bandit instance with maximum number of phases $L \in \{3,4,5,6\}$ and dimensions $d \in \{3,\ldots,8\}$. Afterward, we run the inverse estimator on each instance and measure the metric of relative error of $\hat{\theta}$, defined as $\frac{\norm{\hat{\theta} - \theta^*}_2}{\norm{\theta^*}_2}$. We record this error for the last round of the final phase.% (see \Cref{sec:synth_results} for more details).
%; we plot summaries from our inverse in two figures to follow. %\ashwin{Why is the discussion of this reversed? You discuss Fig 3 before Fig 2.}

\begin{table*}
  \centering
  \begin{sc}
    \begin{tabular}{lcccccc}
        \toprule
        \multicolumn{1}{c}{} & \multicolumn{2}{c}{$\ell_1$ Ball} & \multicolumn{2}{c}{$\ell_2$ Ball} & \multicolumn{2}{c}{$\ell_5$ Ball}\\
        \cmidrule(rl){2-3} \cmidrule(rl){4-5} \cmidrule(rl){6-7} 
          $d$                & Inverse             & Forward             & Inverse             & Forward      & Inverse & Forward             \\ 
        \midrule
        %\cmidrule(r){1-1} \cmidrule(rl){2-5} \cmidrule(l){6-6}
          $3$ & $0.247$ & $0.053$ & $0.011$ & $0.002$ & $0.054$ & $0.083$\\
          $4$ & $0.352$ & $0.097$ & $0.071$ & $0.002$ & $0.172$ & $0.124$\\
          $5$ & $0.464$ & $0.230$ & $0.108$ & $0.004$ & $0.247$ & $0.178$\\
          $6$ & $0.499$ & $0.401$ & $0.138$ & $0.122$ & $0.338$ & $0.249$\\
          $7$ & $0.551$ & $0.586$ & $0.247$ & $0.391$ & $0.324$ & $0.451$\\
          $8$ & $0.587$ & $1.392$ & $0.281$ & $1.136$ & $0.379$ & $0.722$\\
        \bottomrule
    \end{tabular}
  \end{sc}
  \caption{The relative error of the inverse and forward algorithms' estimators for various action sets and dimensions.}
  \label{tab:simul_results}
\end{table*}
% For our first set of experiments, we utilize uniform random samplings of the $\ell_1$, $\ell_2$, and $\ell_5$ unit balls with dimensions ranging from 3 to 8 as action sets. $\theta^*$ is sampled uniformly form the respective ball. For each action set, we then run \Cref{alg:phased_elim} for $100$ sampled bandit instances for as many as $6$ phases and then run the inverse estimator for each phase. We then measure the metric of relative error of $\hat{\theta}$, defined as $\frac{\norm{\hat{\theta} - \theta^*}_2}{\norm{\theta^*}_2}$.% Our results can be found in \Cref{fig:l1 perf}, \Cref{fig:l2 perf}, and \Cref{fig:l5 perf} via log-log plots along with a linear fit to corroborate the asymptotic behavior of the bounds given by \Cref{thm:accuracy_theta_est}.
% \begin{wrapfigure}{R}{0.5 \textwidth}
\begin{figure}
\vspace{-1em}
    \centering
    \includegraphics[width=0.45 \textwidth]{figures/dimension.pdf}
    \caption{Inverse estimation error as a function of dimension $d$ on each action set. Shaded region represents the standard deviation. }
    \label{fig:synth_dimension}
    \vspace{-1.5em}
    \end{figure}
% \end{wrapfigure}

On the one hand, from~\Cref{thm:accuracy_theta_est} we expect relative error to decay with the total number of rounds $T$. From the log-log plots in \Cref{fig:three graphs}, we observe that this trend holds for each action set by examining the trend of each best-fit line. The lines in a \Cref{fig:l1 perf}, \Cref{fig:l2 perf}, and \Cref{fig:l5 perf} each contain slopes in the range $\left[-0.487, -0.413\right]$, indicating a polynomial rate of decay in $T$. %The varying slopes between each plot may result from $\omega$ varying with the choice of action set. 
On the other hand,~\Cref{thm:accuracy_theta_est} also predicts that relative error should increase in $d$. In \Cref{fig:synth_dimension}, we plot the relative error of our inverse estimator on each unit ball for each dimension from $3$ to $8$, verifying that higher dimensional action sets indeed incur higher relative error. 
Furthermore, from the results in \Cref{tab:simul_results}, we observe that at dimensions of $6$ or higher, the inverse algorithm performs comparably to the forward algorithm's estimate $\hat{\theta}$ from the final round, occasionally incurring less relative error.
%Each panel in \Cref{fig:three graphs} contains a plot of the relative error compared against time for either the $\ell_1$, $\ell_2$, or $\ell_5$ ball in log-log scale, accompanied by a log-log scale linear fit. We find that the slopes of each line in \Cref{fig:l1 perf} and \Cref{fig:l2 perf} are very close to $-\frac{1}{2}$, corresponding to the upper bound established in \Cref{thm:accuracy_theta_est}. \Cref{fig:l5 perf} likewise contains similar, albeit greater slopes to their best-fit lines, which may result from $\omega$ varying with the choice of action set.

\subsection{Semi-synthetic Experiments} To validate the performance of our estimator with more realistic data, we simulate the task of recommending movies to users on the MovieLens 25M dataset \cite{lam2006movielens, HarperJoseph2016}, as well as recommending music to users based on the digital music reviews subset of the Amazon Reviews dataset \cite{hou2024bridging}. The MovieLens 25M dataset consists of 25 million ratings across 160,000 users and 60,000 movies, while the Amazon Reviews digital music dataset contains 101,000 users, 70,000 songs, and 130,000 ratings. We follow a similar set up as by \citet{zhu2022robust}. To create an action set and $\theta^*$, we randomly sample $u = 6,000$ users, $m = 4,000$ items, and their corresponding ratings from each dataset. We then perform a matrix factorization on $R \in \mathbb{R}^{u \times m}$, the matrix of ratings for each user and item, using Alternating Least Squares. This yields matrices $U$ and $M$ where $UM^\top = R$, $U \in \mathbb{R}^{u \times d}$, and $M \in \mathbb{R}^{m \times d}$. Therefore, each row in $M$ is a $d$ dimensional embedding corresponding to a item, while each row in $U$ corresponds to the reward parameter for a given user. We then simulate a user's choices and ratings by randomly sampling a reward parameter $\theta^* = U_i$, and running \Cref{alg:practical_phased_elim} with $M$ as the set of arms for $6$ phases. Afterward, we estimate the user's reward parameter via \Cref{alg:our_inverse_estimator}. We repeat this for ten randomly selected users and average the relative error of $\hat{\theta}$ to generate of the entries in~\Cref{tab:movielens} for a fixed dimension $d$. We also repeat the entire experiment for four different values of $d$. Our numerical results are summarized in \Cref{tab:movielens}.
As before, both inverse and forward estimation error increase with the dimension of the action set.
% The results in \Cref{tab:movielens} contain the same trend of increased error with higher dimensional action sets.
%, just as the synthetic datasets and \Cref{thm:accuracy_theta_est} demonstrate. 
% We see that as the dimension increases, our inverse estimation error increases, as similarly seen in the synthetic experiments. While the inverse error is notably more than the forward error, this is to be expected as inverse learning is significantly more difficult than forward learning. We notice that the estimation error is larger than in the synthetic experiments. This is likely due to the nonlinear relationship between user ratings and the movie features, which we assume to be linear for the sake of this experiment. 


% \begin{wrapfigure}{R}{0.5 \textwidth}
\begin{table}

\centering
\begin{sc}
% Please add the followingrequired packages to your document preamble:
% \usepackage{booktabs}
\begin{tabular}{@{}lllll@{}}
\toprule
\multicolumn{1}{c}{} & \multicolumn{2}{c}{MovieLens} & \multicolumn{2}{c}{Amazon Reviews}              \\ \cmidrule(rl){2-3} \cmidrule(rl){4-5}
\multicolumn{1}{l}{$d$} & \multicolumn{1}{c}{Inverse} & \multicolumn{1}{c}{Forward} & \multicolumn{1}{c}{Inverse} & \multicolumn{1}{c}{Forward}\\ \midrule
\multicolumn{1}{l}{2} & \multicolumn{1}{c}{0.2859} & \multicolumn{1}{c}{0.0037} & \multicolumn{1}{c}{0.1250} & \multicolumn{1}{c}{0.0018}     \\
\multicolumn{1}{l}{4} & \multicolumn{1}{c}{0.3666} & \multicolumn{1}{c}{0.0356} & \multicolumn{1}{c}{0.3646}  & \multicolumn{1}{c}{0.0081}   \\
\multicolumn{1}{l}{6} & \multicolumn{1}{c}{0.3641} & \multicolumn{1}{c}{0.1401} & \multicolumn{1}{c}{0.4291} & \multicolumn{1}{c}{0.3660}    \\
\multicolumn{1}{l}{8} & \multicolumn{1}{c}{0.5030} & \multicolumn{1}{c}{0.4632} & \multicolumn{1}{c}{0.3955} & \multicolumn{1}{c}{0.5949} \\ \bottomrule
\end{tabular}
\end{sc}
\makeatletter\def\@captype{table}\makeatother
\caption{Relative error of the inverse estimator on MovieLens 25M and the digital music reviews from Amazon Reviews. }
\label{tab:movielens}
\end{table}
% \begin{table}

% \centering
% \begin{sc}
% % Please add the followingrequired packages to your document preamble:
% % \usepackage{booktabs}
% \begin{tabular}{@{}lll@{}}
% \toprule
% \multicolumn{3}{c}{MovieLens Relative Error}               \\ \midrule
% \multicolumn{1}{l}{$d$} & \multicolumn{1}{c}{Inverse Error} & \multicolumn{1}{c}{Forward Error} \\ \midrule
% \multicolumn{1}{l}{2} & \multicolumn{1}{c}{0.2859} & \multicolumn{1}{c}{0.0037}       \\
% \multicolumn{1}{l}{4} & \multicolumn{1}{c}{0.3666} & \multicolumn{1}{c}{0.0356}      \\
% \multicolumn{1}{l}{6} & \multicolumn{1}{c}{0.3641} & \multicolumn{1}{c}{0.1401}       \\
% \multicolumn{1}{l}{8} & \multicolumn{1}{c}{0.5030} & \multicolumn{1}{c}{0.4632}       \\ \bottomrule
% \end{tabular}
% \end{sc}
% \makeatletter\def\@captype{table}\makeatother
% \caption{Relative error of the inverse estimator on MovieLens 25M. }
% \label{tab:movielens}
% \end{table}

% \end{wrapfigure}


% \kri{Is it possible to add a table in the appendix for the synthetic data, just to compare magnitude of error in synthetic vs MovieLens? this will help us compare how much lower the synthetic data's inverse and forward error is for the same dimension}
% This relationship is most apparent in \cref{fig:l2 perf}. In \cref{fig:l1 perf} and \cref{fig:l5 perf}, this dependence in $d$ is less clear but still visible. For example, in \cref{fig:l5 perf}, the root $d$ dependence is visible for all but the final and first phases. This trend demonstrates empirical verification of \cref{thm:accuracy_theta_est}. Furthermore, across all \cref{fig:l1 perf}, \cref{fig:l2 perf}, and \cref{fig:l5 perf}, our algorithm vastly outperforms the random inverse estimator, as expected. Therefore, our inverse estimator is far more accurate than this random estimator across all action set settings.

% \end{document}