Keywords: adaptive experimentation, model predictive control, A/B Testing
Abstract: Implementing adaptive experimentation methods in the real world often encounters a multitude of operational difficulties, including batched/delayed feedback, non-stationary environments, and constraints on treatment allocations. To improve the flexibility of adaptive experimentation, we propose a Bayesian, optimization-based framework founded on model-predictive control (MPC) for the linear contextual bandit setting. While we focus on simple regret minimization, the framework can flexibly incorporate multiple objectives along with constraints, batches, personalized and non-personalized policies, as well as predictions of future context arrivals. Most importantly, it maintains this flexibility while guaranteeing improvement over non-adaptive A/B testing across all time horizons, and empirically outperforms standard policies such as Thompson Sampling. Overall, this framework offers a way to guide adaptive designs across the varied demands of modern large-scale experiments.
Submission Number: 52
Loading