Keywords: Causal inference, Multi-arm Bandits, Graphical models, Sequential decision making
Abstract: We study the causal bandit problem when the causal graph is unknown and develop
an efficient algorithm for finding the parent node of the reward node using
atomic interventions. We derive the exact equation for the expected number of
interventions performed by the algorithm and show that under certain graphical
conditions it could perform either logarithmically fast or, under more general
assumptions, slower but still sublinearly in the number of variables.
We formally show that our algorithm is optimal as it meets the universal lower
bound we establish for any algorithm that performs atomic interventions.
Finally, we extend our algorithm to the case when the reward node has multiple
parents. Using this algorithm together with a standard algorithm from bandit
literature leads to improved regret bounds.
Supplementary Material: pdf
Publication Agreement: pdf
Submission Number: 53
Loading