A Causal Bandit Approach to Learning Good Atomic Interventions in Presence of Unobserved Confounders
Keywords: Causal Bandits, Causality, Causal Inference, Multi Armed Bandits, Best Arm Identification, Unobserved Confounders, Simple Regret, Cumulative Regret
TL;DR: We design optimal algorithms to learn good interventions in causal graphs by applying bandit based techniques.
Abstract: We study the problem of determining the best atomic intervention in a Causal Bayesian Network (CBN) specified only by its causal graph. We model this as a stochastic multi-armed bandit (MAB) problem with side-information, where interventions on CBN correspond to arms of the bandit instance. First, we propose a simple regret minimization algorithm that takes as input a causal graph with observable and unobservable nodes and in $T$ exploration rounds achieves $\tilde{O}(\sqrt{m(\mathcal{C})/T})$ expected simple regret. Here $m(\mathcal{C})$ is a parameter dependent on the input CBN $\mathcal{C}$ and could be much smaller than the number of arms. We also show that this is almost optimal for CBNs whose causal graphs have an $n$-ary tree structure. Next, we propose a cumulative regret minimization algorithm that takes as input a causal graph with observable nodes and performs better than the optimal MAB algorithms that do not use causal side-information. We experimentally compare both our algorithms with the best known algorithms in the literature.
Supplementary Material: zip
5 Replies
Loading