# CleanRL-supported Papers / Projects

CleanRL has become an increasingly popular deep reinforcement learning library, especially among practitioners who prefer more customizable code. Since its debut in July 2019, CleanRL has supported many open source projects and publications. Below are some CleanRL-supported projects and publications.

**Feel free to edit this list if your project or paper has used CleanRL.**

## Publications

* Md Masudur Rahman and Yexiang Xue. "Bootstrap Advantage Estimation for Policy Optimization in Reinforcement Learning." In Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA), 2022. [https://arxiv.org/pdf/2210.07312.pdf](https://arxiv.org/pdf/2210.07312.pdf)

* Centa, Matheus, and Philippe Preux. "Soft Action Priors: Towards Robust Policy Transfer." arXiv preprint arXiv:2209.09882 (2022). [https://arxiv.org/pdf/2209.09882.pdf](https://arxiv.org/pdf/2209.09882.pdf)

* Weng, Jiayi, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor Makoviychuk, Zichen Liu et al. "Envpool: A highly parallel reinforcement learning environment execution engine." In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. [https://openreview.net/forum?id=BubxnHpuMbG](https://openreview.net/forum?id=BubxnHpuMbG)

* Huang, Shengyi, Rousslan Fernand Julien Dossa, Antonin Raffin, Anssi Kanervisto, and Weixun Wang. "The 37 Implementation Details of Proximal Policy Optimization." International Conference on Learning Representations 2022 Blog Post Track, [https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/)

* Huang, Shengyi, and Santiago Ontañón. "A closer look at invalid action masking in policy gradient algorithms." The International FLAIRS Conference Proceedings, 35. [https://journals.flvc.org/FLAIRS/article/view/130584](https://journals.flvc.org/FLAIRS/article/view/130584)

* Schmidt, Dominik, and Thomas Schmied. "Fast and Data-Efficient Training of Rainbow: an Experimental Study on Atari." Deep Reinforcement Learning Workshop at the 35th Conference on Neural Information Processing Systems, [https://arxiv.org/abs/2111.10247](https://arxiv.org/abs/2111.10247)


* Dossa, Rousslan Fernand Julien, Shengyi Huang, Santiago Ontañón, and Takashi Matsubara. "An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization." IEEE Access 9 (2021): 117981-117992. [https://ieeexplore.ieee.org/abstract/document/9520424](https://ieeexplore.ieee.org/abstract/document/9520424)

* Huang, Shengyi, Santiago Ontañón, Chris Bamford, and Lukasz Grela. "Gym-µRTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning." In 2021 IEEE Conference on Games (CoG), pp. 1-8. IEEE, 2021. [https://ieeexplore.ieee.org/abstract/document/9619076](https://ieeexplore.ieee.org/abstract/document/9619076)

* Huang, Shengyi, and Santiago Ontañón. "Measuring Generalization of Deep Reinforcement Learning Applied to Real-time Strategy Games", AAAI 2021 Reinforcement Learning in Games Workshop, http://aaai-rlg.mlanctot.info/papers/AAAI21-RLG_paper_33.pdf

* Bamford, Chris, Huang, Shengyi, and Lucas, Simon, "Griddly: A platform for AI research in games", *AAAI 2021 Reinforcement Learning in Games Workshop*, [https://arxiv.org/abs/2011.](https://arxiv.org/abs/2011.)06363

* Huang, Shengyi, and Santiago Ontañón. "Action guidance: Getting the best of sparse rewards and shaped rewards for real-time strategy games." AIIDE Workshop on Artificial Intelligence for Strategy Games, [https://arxiv.org/abs/2010.03956](https://arxiv.org/abs/2010.03956)

* Huang, Shengyi, and Santiago Ontañón. "Comparing Observation and Action Representations for Deep Reinforcement Learning in $\mu $ RTS." AIIDE Workshop on Artificial Intelligence for Strategy Gamee, October 2019 [https://arxiv.org/abs/1910.12134](https://arxiv.org/abs/1910.12134)
