[Re] When to Trust Your Model: Model-Based PolicyOptimizationDownload PDF

Published: 20 Feb 2020, Last Modified: 05 May 2023NeurIPS 2019 Reproducibility Challenge Blind ReportReaders: Everyone
Abstract: Reinforcement learning algorithms generally belong to one of two categories: model-based techniques, which attempt to overcome the issue of a lack of prior knowledge by enabling the agent to construct are presentation of its environment, and model-free techniques, which learn a direct mapping from states to actions. Model-free approaches are typically less practical because running a simulation is very timeconsuming or expensive, and model-based approaches tend to achieve lower asymptotic performance due to the error in model approximation. To design an effective model-based algorithm, Janner, M. et al.1studies the role of model usage in policy optimization and introduces a practical algorithm called model-based policy optimization (MBPO), which makes limited use of a predictive model to achieve pronounced improvements in performance comparing to other model-based approaches. The authors of the paper1first formulate and analyze a general implementation for MBPO with monotonic improvement which uses a predictive model to optimize policy, utilizes the policy to collect data and trains the model. Previous study shows it is difficult to justify model usage due to pessimistic bounds on model error and this paper finds a way to modify the pessimistic bounds which solves the problem. Based on the analysis, the authors develop a simple model-based reinforcement learning algorithm of using short model-generated rollouts branched from real data to improve model effectiveness. The Experiments show this algorithm is faster than other state-of-the-art model-based methods such as STEVE, and matches the performance of the best model-free methods like SAC3.In this reproducibility report, we study in detail the MBPO algorithm described in the paper (detailed in Section 3). Our work mainly focuses on the replication of their algorithm and the re-implementation of their predictive model but in a PyTorch version (detailed in Section 4). Lastly, we describe our experiments and provide insightful analysis of our results (detailed in Section 5).
Track: Replicability
NeurIPS Paper Id: https://openreview.net/forum?id=BJg8cHBxUS
4 Replies

Loading