Batch Bayesian Optimization of Delayed Effects Corrections for Thompson Sampling Bandits: A Practical Tuning Algorithm for Adaptive Interventions

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Bayesian Optimization, Thompson Sampling, Reinforcement Learning, Adaptive Interventions
TL;DR: Bayesian Optimization of Delayed Effects Corrections for Thompson Sampling Bandits: A Practical Tuning Algorithm for Adaptive Interventions
Abstract: When the number of reinforcement learning episodes that can be performed to optimize a policy is severely limited, the bias-variance trade-off of bandit algorithms such as Thompson Sampling can be significantly better than that of policy gradient and value function-based methods. However, bandits have no ability to model the delayed effects of actions. In this paper, we develop a batch Bayesian optimization algorithm that learns a delayed effect correction for linear Thompson Sampling bandits. This work is motivated by the problem of tuning adaptive intervention policies where each episode corresponds to a costly and often lengthy trial involving human subjects. We show through extensive experiments in an adaptive intervention simulation environment that the proposed approach can find beneficial delayed effects correction terms under realistic constraints on the number of Bayesian optimization rounds and the batch size per round.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6774
Loading