- Abstract: While deep reinforcement learning (DRL) has enjoyed several recent successes, results reported in the literature are often difficult to reliably reproduce. Difficulties in reproducibility can arise due to many factors, including the lack of access to computational resources or the lack of knowledge of specific implementation details. One factor of particular importance to DRL is the ability to control for sources of nondeterminism during the training process. This is because DRL is faced with the challenges of a nonstationary training distribution and additional sources of randomness that are absent from other areas of machine learning. In this paper, we (1) enable deterministic training in DRL by identifying and controlling for all sources of nondeterminism present during training, and (2) perform an ablation study that shows how these sources of nondeterminism can impact the performance of a DRL agent. We find that even simple sources of nondeterminism such as those stemming from nondeterministic GPU operations can lead to large differences in performance between training runs. Lastly, we make available our deterministic implementation of deep Q-learning.
- TL;DR: We describe a deterministic implementation of Deep Q-learning and use it to show that individual sources of nondeterminism in the deep reinforcement learning process can cause large variation in results.
- Keywords: Reproducibility, Deep Reinforcement Learning, Replicability, Determinism