Posterior Sampling: Make Reinforcement Learning Sample Efficient AgainDownload PDF

25 Sep 2019 (modified: 24 Dec 2019)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone
  • Original Pdf: pdf
  • Abstract: Machine learning thrives on leveraging structure in data, and many breakthroughs (e.g.\ convolutional networks) have been made by designing algorithms which exploit the underlying structure of a distribution. Reinforcement Learning agents interact with worlds that are similarly full of structure. For example, no sequence of actions an agent takes will ever cause the laws of physics to change, therefore an agent which learns to generalize such laws through time and space will have an advantage. Sample efficient reinforcement learning can be accomplished when assuming that the world has structure and designing learning algorithms which exploit this assumption without knowing the actual structure beforehand. Posterior Sampling for Reinforcement Learning (PSRL) \citep{strens2000bayesian} is such a method which assumes structure in the world and exploits it for learning. A PSLR learning agent first samples models of the environment which conform to both prior assumptions on the world's structure and past observations and then interacts with the true environment using a policy guided by the sampled model of the environment. While PSRL delivers theoretical Bayesian regret bounds, there are many open issues which must be addressed before PSRL can be applied to current benchmark continuous reinforcement reinforcement tasks. In this work, we identify these issues and find practical solutions to them leading to a novel algorithm we call Neural-PSRL. We validate the algorithm's effectiveness by achieving state of the art results in the HalfCheetah-v3 and Hopper-v3 domains.
  • Code:
  • Keywords: Model Based Reinforcement Learning
9 Replies