HiT-MDP: Learning the SMDP option framework on MDPs with Hidden Temporal VariablesDownload PDF


22 Sept 2022, 12:39 (modified: 18 Nov 2022, 15:46)ICLR 2023 Conference Blind SubmissionReaders: Everyone
Keywords: Hiearchical Reinforcement Learning, Reinforcement Learning, Markov Decision Process
Abstract: The standard option framework is developed on the Semi-Markov Decision Process (SMDP) which is unstable to optimize and sample inefficient. To this end, we propose a novel Markov Decision Process (MDP), the Hidden Temporal MDP (HiT-MDP), and prove that the option-induced HiT-MDP is homomorphic equivalent to the option-induced SMDP. We also derive a sample efficient structured variational inference-based algorithm which leads to a novel stable option discovering method under the maximum-entropy reinforcement learning framework. Extensive experiments on challenging \textit{Mujoco} environments demonstrate HiT-MDP's efficiency and effectiveness: under widely used configurations, HiT-MDP achieves competitive, if not better, performance compared to the state-of-the-art baselines on all finite horizon and transfer learning environments. Moreover, HiT-MDP significantly outperforms all baselines on infinite horizon environments while exhibiting smaller variance, faster convergence, and better interpretability.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
13 Replies