\section{Discussion}
\label{Sec:Discussion}
We have presented an inverse reinforcement learning algorithm for the setting of linear stochastic bandits and guarantees its convergence behavior as a function of the length of the demonstrator's trajectory. We empirically verified the efficacy of our algorithm in both simulation and semi-synthetic settings. Moreover, we showed a lower bound on the best achievable error by any inverse learner. An interesting future direction would be to extend a similar framework to nonlinear reward functions and general bandit settings. 

%\paragraph{Limitations}
A fundamental limitation of our work, even in the linear bandit setting, is that we limit our demonstrator to being the canonical Phased Elimination algorithm. Moreover, we place assumptions on the density and geometry of the action set for our analysis---weakening these assumptions pose important future directions.