\section{Experiments}
\label{sec:experiments}
In order to provide empirical verification for our results, we run our inverse estimator on several different settings, measuring the error of the estimate of $\theta^*$. 

\subsection{Experiment Setup}
We briefly explain the setup of our experiments. We wish to demonstrate that our inverse estimator is both accurate and sample-efficient. Over several experimental setups, we use our inverse learning algorithm to learn the reward function of the forward algorithm. We measure its accuracy and the number of samples from the forward algorithm required to achieve its accuracy. We see that our empirical results corroborate our theoretical results.  

\paragraph{Synthetic Data Experiments} The three environments are the unit $\ell_1$, $\ell_2$, and $\ell_5$ balls. For each environment, we randomly sample a $\theta^*$ from the respective ball, and the action space is a densely sampled set from the respective ball, where each $\theta^*$ and action space makes a bandit instance. For each environment, we sample bandit instances with vector dimensions ranging from $3$ to $8$. For each dimension and ball pair, we run Phased Elimination on $100$ sampled bandit instances, and then run the inverse estimator. We measure the metric of relative error of $\hat{\theta}$, more specifically $\frac{\norm{\hat{\theta} - \theta^*}_2}{\norm{\theta^*}_2}$. We report our results in \Cref{fig:l1 perf}, \Cref{fig:l2 perf}, and \Cref{fig:l5 perf} via log-log plots along with a linear fit to corroborate the asymptotic behavior of the bounds given by \Cref{thm:accuracy_theta_est}.

\paragraph{Real Data Experiments} Moreover, we utilize the MovieLens 25M \cite{lam2006movielens, HarperJoseph2016} and the Amazon Reviews \cite{hou2024bridging} datasets to examine the power of our Inverse Learning Framework on real datasets. This setup is mostly motivated from \citet{zhu2022robust}. The MovieLens dataset is a dataset containing over 160,000 users, 60,000 movies, and 25 million ratings, and the Amazon Reviews dataset consists of .  We want to emulate the task of many recommender systems in recommending movies to users by watching their choices. Therefore, we firstly generate a linear dataset by first taking $u = 6,000$ users, $m = 4,000$ movies, and their corresponding ratings. We then perform a Matrix Factorization task where $R \in \mathbb{R}^{u \times m}$ is a matrix of ratings for each user and movie. We solve for the matrices $U, M$ where $UM^\top = R$ and $U \in \mathbb{R}^{u \times d}$ and $V \in \mathbb{R}^{m \times d}$ via Alternating Least Squares. To simulate the task of a recommender system for a user, we choose the reward parameter $\theta^* = U_i$ as the user preference vector. We then simulate the user choosing movies by running Phased Elimination, where the actions are the rows of $M$ (the embedding for each movie) and the rewards are the ratings for the movie chosen. Given only the choices of the user via Phased Elimination, we use our Inverse Reinforcement Learning algorithm to estimate the reward parameter $\theta$. We repeat this for 10 randomly selected users and average the relative error of $\hat{\theta}$. We repeat this for four different values of $d$. We report our numerical results in \Cref{tab:movielens}. 

\begin{wrapfigure}{R}{0.5 \textwidth}
\centering

% Please add the following required packages to your document preamble:
% \usepackage{booktabs}
\begin{tabular}{@{}lll@{}}
\toprule
\multicolumn{3}{c}{MovieLens Relative Error}               \\ \midrule
\multicolumn{1}{l}{d} & \multicolumn{1}{c}{Inverse Error} & \multicolumn{1}{c}{Forward Error} \\ \midrule
\multicolumn{1}{l}{2} & \multicolumn{1}{c}{0.2859} & \multicolumn{1}{c}{0.0037}       \\
\multicolumn{1}{l}{4} & \multicolumn{1}{c}{0.3666} & \multicolumn{1}{c}{0.0356}      \\
\multicolumn{1}{l}{6} & \multicolumn{1}{c}{0.3641} & \multicolumn{1}{c}{0.1401}       \\
\multicolumn{1}{l}{8} & \multicolumn{1}{c}{0.5030} & \multicolumn{1}{c}{0.4632}       \\ \bottomrule
\end{tabular}

% \begin{tabular}{|c|c|}
%     \hline
%     \multicolumn{2}{|c|}{MovieLens Relative Error} \\
%      \hline d & Inverse error \\
%      \hline 2 & 0.2859 \\
%      4 & 0.3666 \\
%      6 & 0.3641 \\
%      8 & 0.5030 \\
%      \hline
% \end{tabular}
\makeatletter\def\@captype{table}\makeatother
\caption{Relative error of the inverse estimator on MovieLens 25M. }
\label{tab:movielens}
\end{wrapfigure}

\subsection{Results}


Corresponding to the exponential term on $T$ in \Cref{thm:accuracy_theta_est}, we expect the slope of each line in  \cref{fig:three graphs} to be negative and largely unchanging with respect to dimension. As indicated in the legends in \cref{fig:l1 perf}, \cref{fig:l2 perf}, and \cref{fig:l5 perf}, the slopes are consistent across dimension, and therefore consistent with this trend. Likewise, because of the $d$ term in the numerator of \Cref{thm:accuracy_theta_est}, we expect that the intercept of each line increases with $d$, which is what is observed for each $\ell_p$ ball. 

Further, the MovieLens results indicate that inverse estimator still performs reasonably well on more complicated action sets than $\ell_p$ balls, albeit with worse performance than the forwards algorithm.
% This relationship is most apparent in \cref{fig:l2 perf}. In \cref{fig:l1 perf} and \cref{fig:l5 perf}, this dependence in $d$ is less clear but still visible. For example, in \cref{fig:l5 perf}, the root $d$ dependence is visible for all but the final and first phases. This trend demonstrates empirical verification of \cref{thm:accuracy_theta_est}. Furthermore, across all \cref{fig:l1 perf}, \cref{fig:l2 perf}, and \cref{fig:l5 perf}, our algorithm vastly outperforms the random inverse estimator, as expected. Therefore, our inverse estimator is far more accurate than this random estimator across all action set settings.

% \end{document}
\begin{figure}
  \centering
  \subfigure[$\ell_1$ Ball as Action Set]{\includegraphics[width=0.48\textwidth]{figures/l1_round_err.png}
         \label{fig:l1 perf}}
    \hfill
  \subfigure[$\ell_2$ Ball as Action Set]{\includegraphics[width=0.48\textwidth]{figures/l2_round_err.png}
         \label{fig:l2 perf}}
    \\
\begin{minipage}{0.5\textwidth}
        \centering
        \subfigure[$\ell_5$ Ball as Action Set]{\includegraphics[width=\textwidth]{figures/l5_round_err.png}
         \label{fig:l5 perf}}
    \end{minipage}\hfill

  \caption{Estimator performance over the $\ell_1$, $\ell_2$, and $\ell_5$ balls across several dimensions. Our graphs are in terms of $\log_2$ error and $\log_2$ time, allowing us to fit a linear function to determine the exponent for the trend between error and round.}
  
     \label{fig:three graphs}
\end{figure}
