2019 (modified: 11 Nov 2022)ICML 2019Readers: Everyone
Abstract:This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement Learning (Batch RL): from a fixed dataset and without direct access to the true environment, train a policy that is guarant...