Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

Ziyang Tang*; Yihao Feng*; Lihong Li; Dengyong Zhou; Qiang Liu

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu

Published: 20 Dec 2019, Last Modified: 12 Oct 2025ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We develop a new doubly robust estimator based on the infinite horizon density ratio and off policy value estimation.

Abstract: Infinite horizon off-policy policy evaluation is a highly challenging task due to the excessively large variance of typical importance sampling (IS) estimators. Recently, Liu et al. (2018) proposed an approach that significantly reduces the variance of infinite-horizon off-policy evaluation by estimating the stationary density ratio, but at the cost of introducing potentially high risks due to the error in density ratio estimation. In this paper, we develop a bias-reduced augmentation of their method, which can take advantage of a learned value function to obtain higher accuracy. Our method is doubly robust in that the bias vanishes when either the density ratio or value function estimation is perfect. In general, when either of them is accurate, the bias can also be reduced. Both theoretical and empirical results show that our method yields significant advantages over previous methods.

Keywords: off-policy evaluation, infinite horizon, doubly robust, reinforcement learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/doubly-robust-bias-reduction-in-infinite/code)

Original Pdf: pdf

14 Replies

Loading

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

Ziyang Tang*, Yihao Feng*, Lihong Li, Dengyong Zhou, Qiang Liu

Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu