Bootstrap Your Conversions: Thompson Sampling for Partially Observable Delayed Rewards

Marco Gigli; Fabio Stella

Bootstrap Your Conversions: Thompson Sampling for Partially Observable Delayed Rewards

Marco Gigli, Fabio Stella

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: stochastic bandits, delayed feedback, partially observable feedback, Thompson sampling

TL;DR: A novel, efficient approach to stochastic bandits with partially observable, delayed feedback

Abstract: This paper presents a novel approach to address contextual bandit problems with partially observable, delayed feedback by introducing an approximate Thompson sampling technique. This is a common setting, with applications ranging from online marketing to vaccine trials. Leveraging Bootstrapped Thompson sampling (BTS), we obtain an approximate posterior distribution over delay distributions and conversion probabilities, thereby extending an Expectation-Maximisation (EM) model to the Bayesian domain. Unlike prior methodologies, our approach does not overlook uncertainty on delays. Within the EM framework, we employ the Kaplan-Meier estimator to place no restriction on delay distributions. Through extensive benchmarking against state-of-the-art techniques, our approach demonstrates superior performance across the majority of tested environments, with comparable performance in the remaining cases. Furthermore, our method offers practical implementation using off-the-shelf libraries, facilitating broader adoption. Our technique lays a foundation for extending to other bandit settings, such as non-contextual bandits or action-dependent delay distributions, promising wider applicability and versatility in real-world applications.

Supplementary Material: zip

List Of Authors: Gigli, Marco and Stella, Fabio

Latex Source Code: zip

Signed License Agreement: pdf

Code Url: https://github.com/MarcoGigli/bootstrap-conversions

Submission Number: 264

Loading