Non-Stationary Structural Causal Bandits

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Non-stationarity, Sequential Decision Making
TL;DR: We propose a non-stationary structural causal bandit framework that models intervention propagation over time via temporal causal models, and introduce POMIS$^+$ to identify non-myopic strategies maximizing long-term rewards.
Abstract: We study the problem of sequential decision-making in environments governed by evolving causal mechanisms. Prior work on structural causal bandits--formulations that integrate causal graphs into multi-armed bandit problems to guide intervention selection--has shown that leveraging the causal structure can reduce unnecessary interventions by identifying possibly-optimal minimal intervention sets (POMISs). However, such formulations fall short in dynamic settings where reward distributions may vary over time, as their static, hence myopic, nature focuses on immediate rewards and overlooks the long-term effects of interventions. In this work, we propose a non-stationary structural causal bandit framework that leverages temporal structural causal models to capture evolving dynamics over time. We characterize how interventions propagate over time by developing graphical tools and assumptions, which form the basis for identifying non-myopic intervention strategies. Within this framework, we devise POMIS$^+$, which captures the existence of variables that contribute to maximizing both immediate and long-term rewards. Our framework provides a principled way to reason about temporally-aware interventions by explicitly modeling information propagation across time. Empirical results validate the effectiveness of our approach, demonstrating improved performance over myopic baselines.
Supplementary Material: zip
Primary Area: Probabilistic methods (e.g., variational inference, causal inference, Gaussian processes)
Submission Number: 27658
Loading