Markov Chain Monte Carlo Policy OptimizationDownload PDF

04 Jan 2021 (modified: 24 Mar 2021)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone
Abstract: Discovering approximately optimal policies in domains is crucial to applying reinforcement learning (RL) in many real-world scenarios, which is termed as policy optimization. By viewing the policy optimization from the perspective of variational inference, the representation power of policy network allows us to obtain the approximate posterior of actions conditioned on the states, with the entropy or KL regularization. However, in practice the policy optimization may lead to suboptimal policy estimates due to amortization gap. Inspired by the Markov Chain Monte Carlo (MCMC) techniques, instead of optimizing policy parameters or policy distributions directly, we propose a new policy optimization method, incorporating gradient-based feedback in various ways. The empirical evaluation verifies the performance improvement of the proposed method in many continuous control benchmarks.
0 Replies

Loading