PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning

Dongchi Huang; Jiaqi WANG; Yang Li; Chunhe Xia; Tianle Zhang; Kaige Zhang

PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning

Dongchi Huang, Jiaqi WANG, Yang Li, Chunhe Xia, Tianle Zhang, Kaige Zhang

Published: 01 May 2025, Last Modified: 12 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: An algorithm that leverages privileged information to address partial observability in safe reinforcement learning.

Abstract: Partial observability presents a significant challenge for Safe Reinforcement Learning (Safe RL), as it impedes the identification of potential risks and rewards. Leveraging specific types of privileged information during training to mitigate the effects of partial observability has yielded notable empirical successes. In this paper, we propose Asymmetric Constrained Partially Observable Markov Decision Processes (ACPOMDPs) to theoretically examine the advantages of incorporating privileged information in Safe RL. Building upon ACPOMDPs, we propose the Privileged Information Guided Dreamer (\textit{PIGDreamer}), a model-based RL approach that leverages privileged information to enhance the agent's safety and performance through privileged representation alignment and an asymmetric actor-critic structure. Our empirical results demonstrate that \textit{PIGDreamer} significantly outperforms existing Safe RL methods. Furthermore, compared to alternative privileged RL methods, our approach exhibits enhanced performance, robustness, and efficiency. Codes are available at: https://github.com/hggforget/PIGDreamer.

Lay Summary: Teaching robots to stay safe when they can’t see everything is tricky. Since their sensors are limited, they might miss important dangers, making it hard to avoid risky situations. We want to address this challenge by leveraging extra sensors in the teaching process to equip robots with better danger sense abilities, thereby enhancing its dangerous avoidance ability. We developed a framework called *Asymmetric Constrained Partially Observable Markov Decision Processes (ACPOMDPs)* to show how adding extra sensors during training can help robots better understand their surroundings. Using this framework, we created an algorithm called the *Privileged Information Guided Dreamer*. This algorithm aligns the extra sensor data with the robot’s understanding of the world, improving its ability to avoid dangers and perform tasks safely. Our work improves how robots use extra sensor data to better understand their environment. It achieves state-of-the-art performance in safety navigation tasks, demonstrating the effectiveness of our approach in real-world scenarios.

Link To Code: https://github.com/hggforget/PIGDreamer

Primary Area: Reinforcement Learning->Deep RL

Keywords: Reinforcement Learning; World Models; Safe Reinforcement Learning; Model-based Reinforcement Learning; Privileged Learning

Submission Number: 3835

Loading