Keywords: Energy-based Models, Predictive State Representation, Partially Observable Markov Decision Process, Reinforcement Learning
TL;DR: We propose a novel predictive state representation with energy-based models, that shows superior performance on POMDPs.
Abstract: In real world applications, it is usually necessary for a reinforcement learning algorithm to handle the partial observability beyond Markov decision processes (MDPs). Although the partially observable Markov decision process (POMDP) has been precisely motivated for this requirement, such a formulation raises significant computational and statistical hardness challenges in learning and planning. In this work, we introduce the Energy-based Predictive Representation (EPR), which leads to a unified framework for practical reinforcement learning algorithm design in both MDPs and POMDPs settings, to handle the learning, exploration, and planning in a coherent way. The proposed approach relies on the powerful neural energy-based model to extract sufficient representation, from which Q-functions can be efficiently approximated. With such a representation, we develop an efficient approach for computing confidence, which allows optimism/pessimism in the face of uncertainty to be efficiently implemented in planning, hence managing the exploration versus exploitation tradeoff. An experimental investigation shows that the proposed algorithm can surpass state-of-the-art performance in both MDP and POMDP settings in comparison to existing baselines.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)