Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Shengyi Huang; Jiayi Weng; Rujikorn Charakorn; Min Lin; Zhongwen Xu; Santiago Ontanon

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Shengyi Huang, Jiayi Weng, Rujikorn Charakorn, Min Lin, Zhongwen Xu, Santiago Ontanon

Published: 16 Jan 2024, Last Modified: 14 Mar 2024ICLR 2024 posterEveryoneRevisionsBibTeX

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Distributed, Deep Reinforcement Learning, Distributed Deep Reinforcement Learning, Reproducibility

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: IMPALA has reproducibility issues; we propose a more reproducible architecture called Cleanba; our Atari-57 experiments shows Cleanba PPO and IMPALA variants outperform torchbeast and moolib's IMPALA.

Abstract: Distributed Deep Reinforcement Learning (DRL) aims to leverage more computational resources to train autonomous agents with less training time. Despite recent progress in the field, reproducibility issues have not been sufficiently explored. This paper first shows that the typical actor-learner framework can have reproducibility issues even if hyperparameters are controlled. We then introduce Cleanba, a new open-source platform for distributed DRL that proposes a highly reproducible architecture. Cleanba implements highly optimized distributed variants of PPO and IMPALA. Our Atari experiments show that these variants can obtain equivalent or higher scores than strong IMPALA baselines in moolib and torchbeast and PPO baseline in CleanRL. However, Cleanba variants present 1) shorter training time and 2) more reproducible learning curves in different hardware settings.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Primary Area: reinforcement learning

Submission Number: 1394

Loading