\section{Conclusion}\label{sec:conclusion}

In this paper, we present the first comprehensive analysis of estimators that rely on IW regularization techniques, an increasingly popular approach in OPE and OPL. Our results hold for a broad spectrum of IW regularization methods and are also applicable to the standard IPS without any IW regularization. From our theoretical findings, we derive two learning principles that apply across various IW regularizations. Our results suggest that despite the numerous proposed IW regularization techniques for OPE, conventional methods like clipping, still perform very well in OPL. 

Nevertheless, our work has three primary limitations. First, our bound includes empirical bias and variance terms, making it challenging to derive data-independent suboptimality gaps, as discussed in \cref{subsec:gbound}. Additionally, two-sided bounds for regularized IPS can be loose because they treat both tails similarly, whereas some studies indicate differences between the lower and upper tails of such estimators. Investigating methods to prove generic bounds by treating each side of the inequality individually, as in \citet{gabbianelli2023importance}, could address this issue and is left for future research. Second, optimizing our bound relies on the reparametrization trick and Monte Carlo estimation, which may have limitations in high-dimensional problems. Also, our reparametrization trick was only applied to simple linear-softmax policies defined in \eqref{eq:softmax_pac_bayes}. Therefore, exploring more advanced techniques for optimizing our theoretical bound presents an intriguing direction for future research. While this limitation can be mitigated by considering linear IW regularization techniques, as discussed in \cref{corr:lin_reg_main}, there is potential for better practical optimization of the bound for non-linear IW regularizations. Finally, extending our experiments to more complex policies and challenging settings, such as recommender systems with large action spaces, could further highlight the impact of IW regularization.