Graph Identification and Upper Confidence Evaluation for Causal Bandits with Linear Models

Chen Peng, Di Zhang, Urbashi Mitra

Published: 14 Apr 2024, Last Modified: 02 Oct 2024OpenReview Archive Direct UploadEveryoneRevisionsCC BY 4.0

Abstract: In this paper, the causal bandit problem is investigated, in which the objective is to select an optimal sequence of interventions on nodes in a graph. By exploiting the causal relationships between the nodes whose signals contribute to the reward, interventions are optimized. First, a method to learn the directed acyclic graph is proposed that strongly reduces sample complexity relative to the prior art and adopts a novel edge detection method based on mutual information by learning sub-graphs. It is assumed that the graph is governed by linear structural equations; it is further assumed that the distribution of interventions is unknown. Under the assumption of Gaussian exogenous inputs and minimum-mean squared error weight estimation, a new uncertainty bound tailored to the causal bandit problem is derived. This uncertainty bound drives an upper confidence bound based intervention selection to optimize the reward. Numerical results compare the new methodology to existing schemes and show a substantial performance improvement.