Energy-based Predictive Representation for Reinforcement Learning

Tianjun Zhang; Tongzheng Ren; Chenjun Xiao; Wenli Xiao; Joseph E. Gonzalez; Dale Schuurmans; Bo Dai

Energy-based Predictive Representation for Reinforcement Learning

Tianjun Zhang, Tongzheng Ren, Chenjun Xiao, Wenli Xiao, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Energy-based Models, Predictive State Representation, Partially Observable Markov Decision Process, Reinforcement Learning

TL;DR: We propose a novel predictive state representation with energy-based models, that shows superior performance on POMDPs.

Abstract: In real world applications, it is usually necessary for a reinforcement learning algorithm to handle the partial observability beyond Markov decision processes (MDPs). Although the partially observable Markov decision process (POMDP) has been precisely motivated for this requirement, such a formulation raises significant computational and statistical hardness challenges in learning and planning. In this work, we introduce the Energy-based Predictive Representation (EPR), which leads to a unified framework for practical reinforcement learning algorithm design in both MDPs and POMDPs settings, to handle the learning, exploration, and planning in a coherent way. The proposed approach relies on the powerful neural energy-based model to extract sufficient representation, from which Q-functions can be efficiently approximated. With such a representation, we develop an efficient approach for computing confidence, which allows optimism/pessimism in the face of uncertainty to be efficiently implemented in planning, hence managing the exploration versus exploitation tradeoff. An experimental investigation shows that the proposed algorithm can surpass state-of-the-art performance in both MDP and POMDP settings in comparison to existing baselines.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Supplementary Material: zip

10 Replies

Loading