everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Learning to make sequential decisions solely from interacting with an environment without any supervision has been achieved by the initial installation of deep neural networks as function approximators to represent and learn a value function in high-dimensional MDPs. Reinforcement learning policies face exponentially growing state spaces in experience collection in high dimensional MDPs resulting in a dichotomy between computational complexity and policy success. In our paper we focus on the agent’s interaction with the environment in a high-dimensional MDP during the learning phase and we introduce a theoretically-founded novel method based on experiences obtained through extremum actions. Our analysis and method provides a theoretical basis for effective, accelerated and efficient experience collection, and further comes with zero additional computational cost while leading to significant acceleration of training in deep reinforcement learning. We conduct extensive experiments in the Arcade Learning Environment with high-dimensional state representation MDPs. We demonstrate that our technique improves the human normalized median scores of Arcade Learning Environment by 248% in the low-data regime.