2017 (modified: 11 Nov 2022)ICML 2017Readers: Everyone
Abstract:We consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance. We show...