Informed POMDP: Leveraging Additional Information in Model-Based RLDownload PDF

Published: 20 Jul 2023, Last Modified: 01 Sept 2023EWRL16Readers: Everyone
Keywords: POMDP, RNN, Sufficient Statistic, Model-Based, Privileged Information, Asymmetric Learning
TL;DR: We introduce the informed POMDP to account for additional information available at training time, and propose an objective for learning a sufficient statistic and world model using this information to increase the convergence speed of the policies.
Abstract: In this work, we generalize the problem of learning through interaction in a POMDP by accounting for eventual additional information available at training time. First, we introduce the informed POMDP, a new learning paradigm offering a clear distinction between the training information and the execution observation. Next, we propose an objective for learning a sufficient statistic from the history for the optimal control that leverages this information. We then show that this informed objective consists of learning an environment model from which we can sample latent trajectories. Finally, we show for the Dreamer algorithm that the convergence speed of the policies is sometimes greatly improved on several environments by using this informed environment model. Those results and the simplicity of the proposed adaptation advocate for a systematic consideration of eventual additional information when learning in a POMDP using model-based RL.
1 Reply

Loading