Non-Stationary Causal BanditsDownload PDF

03 Oct 2022 (modified: 05 May 2023)CML4ImpactReaders: Everyone
Abstract: The causal bandit problem is an extension of the conventional multi-armed bandit problem in which the arms available are not independent of each other, but rather are correlated within themselves in a Bayesian graph. This extension is more natural, since day-to-day cases of bandits often have a causal relation between their actions and hence are better represented as a causal bandit problem. Moreover, the class of conventional multi-armed bandits lies within that of causal bandits, since any instance of the former can be modeled in the latter setting by using a Bayesian graph with all independent variables. However, it is generally assumed that the probabilistic distributions in the Bayesian graph are stationary. In this paper, we design non-stationary causal bandit algorithms by equipping the actual state of the art (mainly \algo{causal UCB}, \algo{causal Thompson Sampling}, \algo{causal KL UCB} and \algo{Online Causal TS}) with the restarted Bayesian online change-point detector \cite{RBOCPD}. Experimental results show the minimization of the regret when using optimal change-point detection.
0 Replies