Counterintuitive RL: The Hidden Value of Acting Bad

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Counterintuitive, reinforcement learning
Abstract:

Learning to make sequential decisions solely from interacting with an environment without any supervision has been achieved by the initial installation of deep neural networks as function approximators to represent and learn a value function in high-dimensional MDPs. Reinforcement learning policies face exponentially growing state spaces in experience collection in high dimensional MDPs resulting in a dichotomy between computational complexity and policy success. In our paper we focus on the agent’s interaction with the environment in a high-dimensional MDP during the learning phase and we introduce a theoretically-founded novel method based on experiences obtained through extremum actions. Our analysis and method provides a theoretical basis for effective, accelerated and efficient experience collection, and further comes with zero additional computational cost while leading to significant acceleration of training in deep reinforcement learning. We conduct extensive experiments in the Arcade Learning Environment with high-dimensional state representation MDPs. We demonstrate that our technique improves the human normalized median scores of Arcade Learning Environment by 248% in the low-data regime.

Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10690
Loading