Energy-based Predictive Representations for Partially Observed Reinforcement LearningDownload PDF

Published: 08 May 2023, Last Modified: 26 Jun 2023UAI 2023Readers: Everyone
Keywords: Energy-based Models, Predictive State Representation, Partially Observable Markov Decision Process, Reinforcement Learning
TL;DR: We propose a novel predictive state representation with energy-based models, that shows superior performance on POMDPs.
Abstract: In real-world applications, handling partial observability is a common requirement for reinforcement learning algorithms, which is not captured by a Markov decision process (MDP). Although partially observable Markov decision processes (POMDPs) have been specifically designed to address this requirement, they present significant computational and statistical challenges in learning and planning. In this work, we introduce the \emph{Energy-based Predictive Representation (EPR)} to provide a unified approach for designing practical reinforcement learning algorithms in both the MDP and POMDP settings. This framework enables coherent handling of \emph{learning, exploration, and planning} tasks. The proposed framework leverages a powerful neural energy-based model to extract an adequate representation, allowing for efficient approximation of Q-functions. This representation facilitates the efficient computation of confidence, enabling the implementation of optimism or pessimism in planning when faced with uncertainty. Consequently, it effectively manages the trade-off between exploration and exploitation. Experimental investigations demonstrate that the proposed algorithm achieves state-of-the-art performance in both MDP and POMDP settings.
Supplementary Material: pdf
Other Supplementary Material: zip
0 Replies