PO-Dreamer: Memory Guided World Models for Partially Observable Reinforcement Learning

17 Sept 2025 (modified: 05 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, world model
Abstract: World models predict future states and rewards by learning compact state representations of the environment, thereby enabling efficient policy optimization. World-model-based reinforcement learning (RL) algorithms have demonstrated significant advantages in complex tasks. However the scenarios in real world application are always partially observable (i.e., image based RL and multi-agent RL), and contain non-stationary dynamics. To address the challenges in Partially Observable Markov Decision Processes (POMDP) scenarios, we propose a novel memory guided world model named PO-Dreamer. Besides current observation, we adaptively extract meaningful cues from memory which is helpful to model the environmental dynamics. Then, the features of current observation and memory are fused by the fusion mechanism to predict state transition and future rewards. Extensive experiments on both single-agent (Atari 100K) and multi-agent (SMAC) tasks demonstrate that our method achieves state-of-the-art (SOTA) performance compared to existing strong baselines.
Primary Area: reinforcement learning
Submission Number: 8779
Loading