- Abstract: In complex simulated environments, model-based reinforcement learning methods typically lag the asymptotic performance of model-free approaches. This paper uses two MuJoCo environments to understand this gap through a series of ablation experiments designed to separate the contributions of the dynamics model and planner. These reveal the importance of long planning horizons, beyond those typically used. A dynamics model that directly predicts distant states, based on current state and a long sequence of actions, is introduced. This avoids the need for many recursions during long-range planning, and thus is able to yield more accurate state estimates. These accurate predictions allow us to uncover the relationship between model accuracy and performance, and translate to higher task reward that matches or exceeds current state-of-the-art model-free approaches.
- Keywords: model-based reinforcement learning, mbrl, reinforcement learning, predictive models, predictive learning, forward models, deep learning
- TL;DR: Long-term prediction accuracy limits the performance of model-based RL, and can be improved with a simple change to the form of the model.