Abstract: The causal bandit problem setting is a sequential decision-making framework where actions of interest correspond to interventions on variables in a system assumed to be governed by a causal model. The underlying causality may be exploited when investigating actions in the interest of optimizing the yield of the reward variable. Most existing approaches assume prior knowledge of the underlying causal graph, which is in practice restrictive and often unrealistic. In this paper, we develop a novel Bayesian framework for tackling causal bandit problems that does not rely on possession of the causal graph, but rather simultaneously learns the causal graph while exploiting causal inferences to optimize the reward. Our methods efficiently utilize joint inferences from interventional and observational data in a unified Bayesian model constructed with intervention calculus and causal graph learning. For the implementation of our proposed methodology in the discrete distributional setting, we derive an approximation of the sampling variance of the backdoor adjustment estimator. In the Gaussian setting, we characterize the interventional variance with intervention calculus and propose a simple graphical criterion to share information between arms. We validate our proposed methodology in an extensive empirical study, demonstrating compelling cumulative regret performance against state-of-the-art standard algorithms as well as optimistic implementations of their causal variants that assume strong prior knowledge of the causal structure.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Prepared Camera Ready Revision.
Added discussion on Bayesian bandits in Section 1.
Added additional MCMC experimental details in Appendix B.
Various minor changes.
Code: https://github.com/jirehhuang/bcb
Assigned Action Editor: ~Branislav_Kveton1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 511
Loading