everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
The progress in reinforcement learning algorithm development is at one of its highest points starting from the initial study that enabled sequential decision making from high-dimensional observations. Currently, deep reinforcement learning research has had quite recent breakthroughs from learning without the presence of rewards to learning functioning policies without even knowing the rules of the game. In our paper we focus on the underlying premises that are actively used in deep reinforcement learning algorithm development. We theoretically demonstrate that the performance profiles of the algorithms developed for the data-abundant regime do not transfer to the data-limited regime monotonically. We conduct large-scale experiments in the Arcade Learning Environment and our results demonstrate that the baseline algorithms perform significantly better in the data-limited regime compared to the set of algorithms that were initially designed and compared in the data-abundant regime.