Analyzing the Effects of Emulating on the Reinforcement Learning Manifold

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Reinforcement Learning Manifold, Volatility, Learning via Emulating
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Reinforcement learning has become a prominent research direction with the utilization of deep neural networks as state-action value function approximators enabling exploration and construction of functioning neural policies in MDPs with state representations in high dimensions. While reinforcement learning is currently being deployed in many different settings from medical to finance, the fact that reinforcement learning requires a reward signal from the MDP to learn a functioning policy can be restrictive for tasks in which the construction of the reward function is more or equally complex than learning it. In this line of research several studies proposed algorithms to learn a reward function or an optimal policy from observed optimal trajectories. In this paper, we focus on non-robustness of the state-of-the-art algorithms that accomplish learning without rewards in high dimensional state representation MDPs, and we demonstrate that the vanilla trained deep reinforcement learning policies are more resilient and value aligned than learning without rewards in MDPs with complex state representations.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7613
Loading