Conditional importance sampling for off-policy learning

Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana L Borsa, Tom Schaul, Remi Munos, Will Dabney

11 May 2021OpenReview Archive Direct UploadReaders: Everyone

Abstract: The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

0 Replies