Policy Optimization In the Face of Uncertainty

Tung-Long Vuong; Han Nguyen; Hai Pham; Kenneth Tran

Policy Optimization In the Face of Uncertainty

Tung-Long Vuong, Han Nguyen, Hai Pham, Kenneth Tran

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Abstract: Model-based reinforcement learning has the potential to be more sample efficient than model-free approaches. However, existing model-based methods are vulnerable to model bias, which leads to poor generalization and asymptotic performance compared to model-free counterparts. In this paper, we propose a novel policy optimization framework using an uncertainty-aware objective function to handle those issues. In this framework, the agent simultaneously learns an uncertainty-aware dynamics model and optimizes the policy according to these learned models. Under this framework, the objective function can represented end-to-end as a single computational graph, which allows seamless policy gradient computation via backpropagation through the models. In addition to being theoretically sound, our approach shows promising results on challenging continuous control benchmarks with competitive asymptotic performance and sample complexity compared to state-of-the-art baselines.

Keywords: Reinforcement Learning, Model-based Reinforcement Learning

Original Pdf: pdf

7 Replies

Loading