Debiased Machine Learning and Network Cohesion for Doubly-Robust Differential Reward Models in Contextual Bandits

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Contextual Bandits, Mobile Health, Doubly Robust, Double Machine Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: A common approach to learning mobile health (mHealth) intervention policies is linear Thompson sampling. Two desirable features of an mHealth policy are (1) pooling information across individuals and time and (2) modeling the differential reward linear model with a time-varying baseline reward. Previous approaches focused on pooling information across individuals but not time, thereby failing to capture trends in treatment effects over time. In addition, these approaches did not explicitly model the baseline reward, which limited the ability to precisely estimate the parameters in the differential reward model. In this paper, we propose a novel Thompson sampling algorithm, termed "DML-TS-NNR" that leverages (1) nearest-neighbors to efficiently pool information on the differential reward function across users $\textit{and}$ time and (2) the Double Machine Learning (DML) framework to explicitly model baseline rewards and stay agnostic to the supervised learning algorithms used. By explicitly modeling baseline rewards, we obtain smaller confidence sets for the differential reward parameters. We offer theoretical guarantees on the pseudo-regret, which are supported by empirical results. Importantly, the DML-TS-NNR algorithm demonstrates robustness to potential misspecifications in the baseline reward model.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3027
Loading