FutureDD: Planning in POMDP with Encoded Future Dynamics

Yao Tang; Zhihui Xie; Tong Yu; Bokai Hu; Shuai Li

FutureDD: Planning in POMDP with Encoded Future Dynamics

Yao Tang, Zhihui Xie, Tong Yu, Bokai Hu, Shuai Li

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Offline Reinforcement Learning, Partially Observable Markov Decision Process, Sequential Decision Making, Diffusion Models

TL;DR: Developing a framework that utilizes encoded future dynamics to reduce uncertainty for decision making in partially observable environments

Abstract: Partially observable Markov decision process (POMDP) is a powerful framework for modeling decision-making problems where agents do not have full access to environment states. In the realm of offline reinforcement learning (RL), agents need to extract policies on previously recorded decision-making datasets without directly interacting with environments. Due to the inherent partial observability of environments and the limited availability of offline data, agents must possess the capability to extract valuable insights from limited data, which can serve as crucial prior information for making informed decisions. Recent works have shown that deep generative models, particularly diffusion models, exhibit impressive performance in offline RL. However, most of these approaches mainly focus on fully observed environments while neglecting POMDPs, and heavily rely on history information for decision-making, disregarding the valuable prior information about the future that can be extracted from offline data. Having recognized this gap, we propose a novel framework $\textit{FutureDD}$ to extract future prior. $\textit{FutureDD}$ leverages an auxiliary prior model encoding future sub-trajectories to a latent variable, which serves as a compensation for directly modeling observations with a diffusion model. This enables $\textit{FutureDD}$ to extract richer prior information from limited offline data for agents to predict potential future dynamics. The experimental results on a set of tasks demonstrate that in the context of POMDPs, $\textit{FutureDD}$ provides a simple yet effective approach for agents to learn behaviours yielding higher returns.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2296

Loading