2021 (modified: 24 Feb 2022)ICML 2021Readers: Everyone
Abstract:We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function. For this pr...