Keywords: Causal Inference, Counterfactual Target Achievement, Sequential Decision Making
Abstract: Identifying optimal treatment sequences from offline data to guide temporal systems toward target outcomes is a critical challenge with profound implications for fields like personalized medicine. While existing methods are mostly evaluated in offline settings, practical applications demand online, adaptive strategies that can respond in real-time. To address this, we propose \textbf{G}oal-aware \textbf{I}ntervention via \textbf{F}actual-\textbf{T}arget Training (\textbf{GIFT}), a novel framework for learning sequential treatment policies from observational data. GIFT learns a goal-conditioned policy by applying variance-controlled importance weights, which serves as our reward-rescaling mechanism, to guide patient trajectories towards a desired target. Theoretically, our algorithm is guaranteed to converge, and we characterize its induced approximation bias by bounding the gap between our solution and the learned policy's true value. Experiments show GIFT significantly outperforms existing methods in creating goal-oriented policies for online deployment.
Supplementary Material: zip
Primary Area: causal reasoning
Submission Number: 8187
Loading