Causal Eligibility Traces for Confounding Robust Off-Policy Evaluation

Published: 07 May 2025, Last Modified: 13 Jun 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Causal Inference, Graphical Models
TL;DR: This paper proposes two novel algorithms using eligibility traces that correctly bound value functions of a target policies from confounded observational data generated by a different behavior policy.
Abstract: A unifying theme in Artificial Intelligence is learning an effective policy to control an agent in an unknown environment in order to optimize a certain performance measure. Off-policy methods can significantly improve sample efficiency during training, since they allow an agent to learn from observed trajectories generated by different behavior policies, without directly deploying target policies in the underlying environment. This paper studies off-policy evaluation from biased offline data where (1) unobserved confounding bias cannot be ruled out a priori; or (2) the observed trajectories do not overlap with intended behaviors of the learner, i.e., the target and behavior policies do not share a common support. Specifically, we extend Bellman's equation to derive effective closed-form bounds over value functions from the observational distribution contaminated with unobserved confounding and no overlap. Second, we propose two novel algorithms that use eligibility traces to estimate these bounds from finite observational data. Compared to other methods for robust off-policy evaluation in sequential environments, these methods are model-free and extend, for the first time, the well-celebrated temporal difference algorithms (Sutton, 1988) to biased offline data with unobserved confounding and no overlap.
Latex Source Code: zip
Signed PMLR Licence Agreement: pdf
Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission213/Authors, auai.org/UAI/2025/Conference/Submission213/Reproducibility_Reviewers
Submission Number: 213
Loading