Informed POMDP: Leveraging Additional Information in Model-Based RL

Published: 19 Jun 2023, Last Modified: 09 Jul 2023Frontiers4LCDEveryoneRevisionsBibTeX
Keywords: POMDP, Sufficient Statistic, Model-Based, Privileged Information, RNN, Asymmetric Learning
TL;DR: We introduce the informed POMDP to account for additional information available at training time, and propose an objective for learning a world model using this information to increase the convergence speed of the policies
Abstract: In this work, we generalize the problem of learning through interaction in a POMDP by accounting for eventual additional information available at training time. First, we introduce the informed POMDP, a new learning paradigm offering a clear distinction between the training information and the execution observation. Next, we propose an objective for learning a sufficient statistic from the history for the optimal control that leverages this information. We then show that this informed objective consists of learning an environment model from which we can sample latent trajectories. Finally, we show for the Dreamer algorithm that the convergence speed of the policies is sometimes greatly improved on several environments by using this informed environment model. Those results and the simplicity of the proposed adaptation advocate for a systematic consideration of eventual additional information when learning in a POMDP using model-based RL.
Submission Number: 132
Loading