- Keywords: reinforcement learning
- Abstract: Reinforcement learning algorithms for real-world autonomous driving must be able to handle complex, unknown dynamical systems. This requirement is han- dled well by model-free algorithm such as PPO. However, model-free approach tend to be substantially less sample-efficient. In this work, we aim to retain the advantages of model-free method and increase the stability and data-efficiency of PPO. To this end we proposed a world model to model popular reinforcement learning environments through compressed spatio-temporal representations, which allow model-free method learning behaviors from imagined outcomes to increase sample-efficiency. The experimental results indicate that our approach mitigating the inefficiency of PPO, increasing the stability, and largely reducing the train- ing time. code is available at www.github.com/Mrmoore98/World-Model.git. The video can be found here.