Safe Policy Improvement with Baseline BootstrappingDownload PDFOpen Website

2019 (modified: 11 Nov 2022)ICML 2019Readers: Everyone
Abstract: This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement Learning (Batch RL): from a fixed dataset and without direct access to the true environment, train a policy that is guarant...
0 Replies

Loading