SpeedyZero: Mastering Atari with Limited Data and TimeDownload PDF


22 Sept 2022, 12:35 (modified: 15 Nov 2022, 04:19)ICLR 2023 Conference Blind SubmissionReaders: Everyone
Keywords: Reinforcement Learning System, Distributed Training, Model-Based Reinforcement Learning
TL;DR: SpeedyZero is a distributed model-based RL training system based on EfficientZero, featuring fast training speed and high sample efficiency.
Abstract: Many recent breakthroughs of deep reinforcement learning (RL) are mainly built upon large-scale distributed training of model-free methods using millions to billions of samples. On the other hand, state-of-the-art model-based RL methods can achieve human-level sample efficiency but often take a much longer overall training time than model-free methods. However, high sample efficiency and fast training time are both important to many real-world applications. We develop SpeedyZero, a distributed RL system built upon a state-of-the-art model-based RL method, EfficientZero, with a dedicated system design for fast distributed computation. We also develop a novel algorithmic technique, Priority Refresh, to stabilize massively parallel model-based training. SpeedyZero maintains on-par sample efficiency compared with EfficientZero while achieving a 20X speedup in wall-clock time, leading to human-level performances on the Atari benchmark within 30 minutes using only 300k samples. In addition, we also present an in-depth analysis on the fundamental challenges in further scaling our system to bring insights to the community.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
10 Replies