Abstract: The progress in reinforcement learning algorithm development is at one of its highest points starting from the initial study that enabled sequential decision making from high-dimensional observations. Currently, deep reinforcement learning research has had quite recent breakthroughs from learning without the presence of rewards to learning functioning policies without even knowing the rules of the game. In our paper we focus on the trends currently used in deep reinforcement learning algorithm development in the low-data regime. We theoretically show that the performance profiles of the algorithms developed for the high-data regime do not transfer to the low-data regime in the same order. We conduct extensive experiments in the Arcade Learning Environment and our results demonstrate that the baseline algorithms perform significantly better in the low-data regime compared to the set of algorithms that were initially designed and compared in the large-data region.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
15 Replies
Loading