Keywords: batched bandits, off policy evaluation, higher order inference, asymptotic expansion
TL;DR: generalize the asymptotic approximation approach to off policy evaluation in batched bandits from first order approximation to general order by using asymptotic expansion method.
Abstract: Adaptive experiments have been gaining traction in a variety of domains, which stimulates a growing literature focusing on post-experimental statistical inference on data collected from such designs. Prior work constructs confidence intervals mainly based on two types of methods: (i) martingale concentration inequalities and (ii) asymptotic approximation to distribution of test statistics; this work contributes to the second kind. The current asymptotic approximation methods however mostly rely on first-order limit theorems, which can have a slow convergence in a data-poor regime. Besides, established results often rely on conditions that noises behave well, which can be problematic when the real-world instances are heavy-tailed or asymmetric. In this paper, we propose a higher-order asymptotic expansion formula for inference on adaptively collected data, which generalizes normal approximation to the distribution of standard test statistics. Our theorem relaxes assumptions on the noise distribution and benefits a higher-order approximation in the distributional distance to accommodate small sample sizes. We complement our results by promising empirical performances in simulations.