Bootstrap Your Conversions: Thompson Sampling for Partially Observable Delayed Rewards

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: stochastic bandits, delayed feedback, partially observable feedback, Thompson sampling
TL;DR: A novel, efficient approach to stochastic bandits with partially observable, delayed feedback
Abstract: This paper presents a novel approach to address contextual bandit problems with partially observable, delayed feedback by introducing an approximate Thompson sampling technique. This is a common setting, with applications ranging from online marketing to vaccine trials. Leveraging Bootstrapped Thompson sampling (BTS), we obtain an approximate posterior distribution over delay distributions and conversion probabilities, thereby extending an Expectation-Maximisation (EM) model to the Bayesian domain. Unlike prior methodologies, our approach does not overlook uncertainty on delays. Within the EM framework, we employ the Kaplan-Meier estimator to place no restriction on delay distributions. Through extensive benchmarking against state-of-the-art techniques, our approach demonstrates superior performance across the majority of tested environments, with comparable performance in the remaining cases. Furthermore, our method offers practical implementation using off-the-shelf libraries, facilitating broader adoption. Our technique lays a foundation for extending to other bandit settings, such as non-contextual bandits or action-dependent delay distributions, promising wider applicability and versatility in real-world applications.
Supplementary Material: zip
List Of Authors: Gigli, Marco and Stella, Fabio
Latex Source Code: zip
Signed License Agreement: pdf
Code Url: https://github.com/MarcoGigli/bootstrap-conversions
Submission Number: 264
Loading